Bitmask DP · study.

Every dynamic program so far has indexed its subproblems by something ordered: a prefix of an array, a position in a string, a capacity remaining. But some problems have no useful order. In the traveling salesman problem, the sequence in which the visited cities were reached is irrelevant; what matters is only the set of visited cities and the current city. The natural subproblem state is therefore a subset of the ground set, and a DP over subsets seems to need a table indexed by subsets, of which there are $2^{n}$ .

A subset of an $n$ -element set is an $n$ -bit string, hence an integer in $[0, 2^{n})$ . So we encode the subset as the bits of an integer mask and index the DP table by that integer. When $n \leq\sim 20$ the $2^{n}$ masks (up to about a million) fit comfortably in memory, and a $Θ (n!)$ exhaustive search collapses to $O (2^{n} \cdot poly (n))$ .¹ This encoding builds on the general principles of dynamic programming, and enables a handful of recurring DP shapes.

Bit tricks, compactly

A mask is an integer whose bit $i$ (counting from $0$ , least significant) is $1$ iff element $i$ is in the set. The operations we need are all $O (1)$ :

test whether $i \in mask$ : (mask >> i) & 1.
add $i$ : mask | (1 << i); remove $i$ : mask & ~(1 << i).
isolate the lowest set bit: mask & -mask (two's complement makes $- mask = \sim mask + 1$ , which agrees with mask only at and below the lowest $1$ ).
popcount (number of set bits): a hardware instruction, or $Kernighan$ 's loop while (m) { m &= m - 1; c++; }, which clears the lowest set bit each step.
iterate the members: for $i$ from $0$ to $n - 1$ , test bit $i$ .
the full set is (1 << n) - 1; the empty set is 0.

The whole universe of subsets is the integers $0, 1, \dots, 2^{n} - 1$ , so a loop for (mask = 0; mask < (1 << n); mask++) visits every subset once. Because mask | (1 << j) > mask whenever bit $j$ was unset, masks that add an element are numerically larger, so iterating masks in increasing order processes every subset after all of its proper subsets, giving the topological order a subset DP needs.

Subset lattice for

n = 3

: each edge adds one bit, so

mask ∣ (1 ≪ j) > mask

— increasing integer order visits every subset after its proper subsets

Held–Karp: the traveling salesman archetype

The cleanest bitmask DP is Held–Karp for the traveling salesman problem, which is itself NP-hard so no polynomial algorithm is known. Given $n$ cities and pairwise distances $d (i, j)$ , find the shortest tour that starts at city $0$ , visits every city exactly once, and returns to $0$ . Brute force tries all $(n - 1)!$ orderings. The DP observation is that a partial tour is fully summarized by which cities it has visited and where it currently ends; the order in which the visited cities were reached is irrelevant to how cheaply we can finish.

The exponential savings come from state sharing: many distinct visit orders reach the very same $(mask, i)$ , and the DP keeps only the cheapest, solving each state once instead of re-exploring every permutation.

Two visit orders, one state. Both

0 \to 1 \to 2 \to 3

and

0 \to 2 \to 1 \to 3

visit

{0, 1, 2, 3}

and end at

3

, so Held–Karp folds them into the single state

d p [1111] [3]

and keeps only the cheaper, collapsing the factorial of orders.

d p [mask] [i]

= best path visiting the cities in mask, ending at

i

, extended to a new city

j

The recurrence extends a path ending at $i$ to a new, previously-unvisited city $j$ by setting bit $j$ :

d p [mask ∣ (1 ≪ j)] [j] = i \in mask, j \in / mask min (d p [mask] [i] + d (i, j)) .

The base case is $d p [{0}] [0] = 0$ . The answer closes the tour back to the start over the full mask $F = 2^{n} - 1$ :

cost = i min (d p [F] [i] + d (i, 0)) .

Algorithm:

\textsc{Held-Karp}(d, n)

— shortest tour by bitmask DP,

O(2^n n^2)

1
$dp[\text{mask}][i] \gets \infty$ for all mask, $i$
2
$dp[1][0] \gets 0$
only city $0$ visited, at $0$
3
for $\text{mask} \gets 1$ to $2^n - 1$ do
4
for $i \gets 0$ to $n - 1$ do
5
if $dp[\text{mask}][i] = \infty$ then continue
6
if $\text{mask}$ does not contain $i$ then continue
7
for $j \gets 0$ to $n - 1$ do
8
if $\text{mask}$ contains $j$ then continue
visited
9
$\text{nmask} \gets \text{mask} \mid (1 \ll j)$
10
$dp[\text{nmask}][j] \gets \min\parens{dp[\text{nmask}][j],\ dp[\text{mask}][i] + d(i,j)}$
11
$\textbf{return } \min_i \parens{dp[2^n - 1][i] + d(i, 0)}$

Iterating masks in increasing order is valid because $nmask > mask$ , so every state is finalized before it is read. The table has $2^{n} \cdot n$ entries and each is relaxed by an inner loop over $n$ candidate predecessors, giving $O (2^{n} n^{2})$ time and $O (2^{n} n)$ space. That is exponential, but $2^{n} n^{2}$ at $n = 18$ is about $1 0^{8}$ , entirely feasible, whereas $17! \approx 3.6 \times 1 0^{14}$ is not.

A worked instance

Take $n = 4$ cities with the symmetric distance matrix

d = 01230029101206429608310480

Masks are $4$ -bit integers with bit $0$ always set (the tour starts at city $0$ ). The base case is $d p [0001] [0] = 0$ . Processing masks in increasing integer order, each state $d p [mask] [i]$ takes the cheapest predecessor $i^{'}$ in $mask ∖ {i}$ , and we store that $i^{'}$ as a back-pointer. The reachable entries fill in as follows (only finite entries shown; $mask$ printed in binary with bit $0$ rightmost):

mask	ends at $i$	$d p$	realizing transition
$0011$	$1$	$2$	$d p [0001] [0] + d (0, 1) = 0 + 2$
$0101$	$2$	$9$	$d p [0001] [0] + d (0, 2) = 0 + 9$
$1001$	$3$	$10$	$d p [0001] [0] + d (0, 3) = 0 + 10$
$0111$	$1$	$15$	$d p [0101] [2] + d (2, 1) = 9 + 6$
$0111$	$2$	$8$	$d p [0011] [1] + d (1, 2) = 2 + 6$
$1011$	$1$	$14$	$d p [1001] [3] + d (3, 1) = 10 + 4$
$1011$	$3$	$6$	$d p [0011] [1] + d (1, 3) = 2 + 4$
$1101$	$2$	$18$	$d p [1001] [3] + d (3, 2) = 10 + 8$
$1101$	$3$	$17$	$d p [0101] [2] + d (2, 3) = 9 + 8$
$1111$	$1$	$21$	$min$ over predecessors; winner $d p [1101] [3] + d (3, 1) = 17 + 4$
$1111$	$2$	$14$	$d p [1011] [3] + d (3, 2) = 6 + 8$
$1111$	$3$	$16$	$d p [0111] [2] + d (2, 3) = 8 + 8$

Each $d p [1111] [i]$ took the minimum over its in-mask predecessors; the rightmost column names the winning one, whose city index is the stored back-pointer. Closing each full-mask entry back to city $0$ gives $min_{i} (d p [1111] [i] + d (i, 0))$ :

i = 1 : i = 2 : i = 3 : 21 + d (1, 0) = 21 + 2 = 23, 14 + d (2, 0) = 14 + 9 = 23, 16 + d (3, 0) = 16 + 10 = 26.

The minimum tour cost is $23$ . Reading back-pointers from the closing argmin at $i = 2$ ( $d p [1111] [2] = 14$ , predecessor $3$ ; $d p [1011] [3] = 6$ , predecessor $1$ ; $d p [0011] [1] = 2$ , predecessor $0$ ) recovers the tour $0 \to 1 \to 3 \to 2 \to 0$ with edge costs $2 + 4 + 8 + 9 = 23$ . Starting the traceback at $i = 1$ recovers the reverse tour $0 \to 2 \to 3 \to 1 \to 0$ , the same cost — expected, since a symmetric-distance tour and its reversal are equal.

Traceback on the worked instance: back-pointers from the closing argmin

d p [1111] [2]

walk down to lower masks, peeling off one city per step and rebuilding the tour

0 - 1 - 3 - 2 - 0

of cost

23

This is just Shortest Path Visiting All Nodes: take $d (i, j) = 1$ for graph edges and run the same $d p [mask] [i]$ as a BFS over states $(mask, i)$ , where mask is the set of nodes visited so far and the answer is the first time any state with $mask = F$ is reached. Unlike the single-source shortest paths of weighted graphs, here the path must cover every node, which is what forces the subset state.

held_karp.pypython

from math import inf
from typing import Sequence

# A distance matrix: distances[source][target] is the cost of that hop.
DistanceMatrix = Sequence[Sequence[float]]

def held_karp(distances: DistanceMatrix) -> float:
  """
    The cost of the cheapest tour that starts at city 0, visits every\n
    city exactly once, and returns to city 0. Returns 0.0 for a single\n
    city and inf when no closing tour exists.\n
  """
  city_count: int = len(distances)
  if city_count <= 1:
    return 0.0

  full_mask: int = (1 << city_count) - 1

  # best_cost[mask][end] = cheapest path from 0 visiting exactly `mask`,
  # ending at city `end`. inf marks a state not yet reachable.
  best_cost: list[list[float]] = [
    [inf for _ in range(city_count)] for _ in range(1 << city_count)
  ]
  best_cost[1][0] = 0.0

  # Increasing mask order is a valid topological order: adding a city
  # only ever produces a numerically larger mask.
  for mask in range(1 << city_count):
    for end in range(city_count):

      # skip states where `end` isn't visited or isn't yet reachable.
      current: float = best_cost[mask][end]
      if current == inf or not (mask >> end) & 1:
        continue

      # extend the path to each unvisited city, relaxing that state.
      for nxt in range(city_count):
        if (mask >> nxt) & 1:
          continue
        next_mask: int = mask | (1 << nxt)
        candidate: float = current + distances[end][nxt]
        best_cost[next_mask][nxt] = min(best_cost[next_mask][nxt], candidate)

  # Close the tour by returning to city 0 from each possible last city.
  return min(
    best_cost[full_mask][end] + distances[end][0]
    for end in range(city_count)
  )

def held_karp_tour(distances: DistanceMatrix) -> tuple[float, list[int]]:
  """
    Like `held_karp`, but also reconstructs an optimal ordering of\n
    cities (always starting at 0). Returns (cost, tour); the tour lists\n
    each city once and does not repeat the start at the end.\n
  """
  city_count: int = len(distances)
  if city_count <= 1:
    return 0.0, list(range(city_count))

  full_mask: int = (1 << city_count) - 1
  best_cost: list[list[float]] = [
    [inf for _ in range(city_count)] for _ in range(1 << city_count)
  ]

  # parent[mask][end] is the city visited just before `end` on the
  # cheapest path realizing (mask, end), or -1 at the start.
  parent: list[list[int]] = [
    [-1 for _ in range(city_count)] for _ in range(1 << city_count)
  ]
  best_cost[1][0] = 0.0

  for mask in range(1 << city_count):
    for end in range(city_count):

      # skip states where `end` isn't visited or isn't yet reachable.
      current: float = best_cost[mask][end]
      if current == inf or not (mask >> end) & 1:
        continue

      # extend to each unvisited city, recording the predecessor on improve.
      for nxt in range(city_count):
        if (mask >> nxt) & 1:
          continue
        next_mask: int = mask | (1 << nxt)
        candidate: float = current + distances[end][nxt]
        if candidate < best_cost[next_mask][nxt]:
          best_cost[next_mask][nxt] = candidate
          parent[next_mask][nxt] = end

  # close the tour to city 0, keeping the cheapest last city.
  best_cost_total: float = inf
  last_city: int = -1
  for end in range(city_count):
    closed: float = best_cost[full_mask][end] + distances[end][0]
    if closed < best_cost_total:
      best_cost_total = closed
      last_city = end

  # walk the parent pointers back to the start, then reverse.
  tour: list[int] = []
  mask: int = full_mask
  city: int = last_city
  while city != -1:
    tour.append(city)
    previous: int = parent[mask][city]
    mask &= ~(1 << city)
    city = previous

  tour.reverse()
  return best_cost_total, tour

shortest_path_visiting_all_nodes.pypython

from collections import deque
from typing import NamedTuple, Sequence

# adjacency[node] lists the neighbors reachable from `node`.
Adjacency = Sequence[Sequence[int]]

class State(NamedTuple):
  """
    A BFS state: the set of visited nodes and the current position.\n
  """
  mask: int
  node: int

def shortest_path_visiting_all_nodes(adjacency: Adjacency) -> int:
  """
    The fewest edges in a walk that visits every node at least once.\n
    The walk may start and end anywhere and may revisit nodes. Returns\n
    0 for a graph with one (or no) node.\n
  """
  node_count: int = len(adjacency)
  if node_count <= 1:
    return 0

  full_mask: int = (1 << node_count) - 1

  # A walk may start at any node; seed one state per starting node.
  queue: deque[tuple[State, int]] = deque()
  seen: set[State] = set()
  for start in range(node_count):
    state = State(1 << start, start)
    queue.append((state, 0))
    seen.add(state)

  while queue:

    # the first state to cover every node is shortest (BFS by edge count).
    state, distance = queue.popleft()
    if state.mask == full_mask:
      return distance

    # step to each neighbor, enqueueing states we haven't reached before.
    for neighbor in adjacency[state.node]:
      next_state = State(state.mask | (1 << neighbor), neighbor)
      if next_state in seen:
        continue
      seen.add(next_state)
      queue.append((next_state, distance + 1))

  # Disconnected graph: no walk covers every node.
  return -1

Assignment and matching by mask

The same idea solves the assignment problem: assign $n$ tasks to $n$ workers, where worker $k$ doing task $j$ costs $c [k] [j]$ , minimizing total cost. Here the mask tracks which tasks have been assigned. If we always assign workers in order $0, 1, 2, \dots$ , then once mask is fixed the number of workers already placed is determined: it is exactly $popcount (mask)$ .

The transition gives worker $k$ each task $j$ not yet used:

d p [mask ∣ (1 ≪ j)] = j \in / mask min (d p [mask] + c [k] [j]), k = popcount (mask) .

Assignment by mask: with

mask = 0110

two tasks are taken, so

popcount = 2

fixes worker

k = 2

as next, branching to a free task (

0

3

)

With base $d p [0] = 0$ and answer $d p [2^{n} - 1]$ , there are $2^{n}$ states each with $n$ transitions: $O (2^{n} n)$ time, $O (2^{n})$ space, a clean factor of $n$ cheaper than Held–Karp because the current position dimension is replaced by the free information $popcount (mask)$ . Maximum Students Taking Exam is a row-by-row variant: the per-row state is a bitmask of seated columns, and a row's choice is constrained by the previous row's mask plus broken seats.

assignment_mask.pypython

from math import inf
from typing import Sequence

# cost[worker][task] is the cost of giving `task` to `worker`.
CostMatrix = Sequence[Sequence[float]]

def minimum_assignment_cost(cost: CostMatrix) -> float:
  """
    The least total cost of a perfect one-to-one assignment of workers\n
    to tasks. Assumes a square matrix; returns 0.0 when there are no\n
    workers.\n
  """
  worker_count: int = len(cost)
  if worker_count == 0:
    return 0.0

  full_mask: int = (1 << worker_count) - 1

  # best_cost[mask] = least cost of assigning workers 0..popcount(mask)-1
  # to exactly the tasks in `mask`.
  best_cost: list[float] = [inf for _ in range(1 << worker_count)]
  best_cost[0] = 0.0

  for mask in range(1 << worker_count):

    # skip unreached masks and the all-tasks-placed mask.
    current: float = best_cost[mask]
    worker: int = bin(mask).count("1")
    if current == inf or worker == worker_count:
      continue

    # give the next worker (index = tasks taken) each free task.
    for task in range(worker_count):
      if (mask >> task) & 1:
        continue
      next_mask: int = mask | (1 << task)
      candidate: float = current + cost[worker][task]
      best_cost[next_mask] = min(best_cost[next_mask], candidate)

  return best_cost[full_mask]

def assignment(cost: CostMatrix) -> tuple[float, list[int]]:
  """
    Like `minimum_assignment_cost`, but also returns the chosen task for\n
    each worker: assignment[worker] is the task index given to it.\n
    Returns (total_cost, assignment).\n
  """
  worker_count: int = len(cost)
  if worker_count == 0:
    return 0.0, []

  full_mask: int = (1 << worker_count) - 1
  best_cost: list[float] = [inf for _ in range(1 << worker_count)]
  best_cost[0] = 0.0

  # chosen_task[mask] records which task the last-placed worker took to
  # reach `mask`, letting us unwind the assignment afterward.
  chosen_task: list[int] = [-1 for _ in range(1 << worker_count)]

  for mask in range(1 << worker_count):

    # skip unreached masks and the all-tasks-placed mask.
    current: float = best_cost[mask]
    worker: int = bin(mask).count("1")
    if current == inf or worker == worker_count:
      continue

    # give the next worker each free task, recording it on improvement.
    for task in range(worker_count):
      if (mask >> task) & 1:
        continue
      next_mask: int = mask | (1 << task)
      candidate: float = current + cost[worker][task]
      if candidate < best_cost[next_mask]:
        best_cost[next_mask] = candidate
        chosen_task[next_mask] = task

  # unwind: peel the last-placed task off the mask, worker by worker.
  result: list[int] = [-1 for _ in range(worker_count)]
  mask: int = full_mask
  for worker in range(worker_count - 1, -1, -1):
    task: int = chosen_task[mask]
    result[worker] = task
    mask &= ~(1 << task)

  return best_cost[full_mask], result

Subset-sum partitioning over masks

Partition to K Equal Sum Subsets asks whether a multiset can be split into $k$ groups of equal sum $S = (\sum a_{i}) / k$ . Treat the chosen elements as a mask and carry just enough state to describe the current group's fill.

Concretely $d p [mask]$ stores the used capacity of the current bucket, defined when mask is reachable. From a reachable mask with partial fill $r = d p [mask]$ , we may add any element $i \in / mask$ whose value $a_{i}$ keeps the bucket within $S$ :

if r + a_{i} \leq S : d p [mask ∣ (1 ≪ i)] \leftarrow (r + a_{i}) mod S,

where hitting exactly $S$ rolls over to $0$ and opens a fresh bucket. The whole set is partitionable iff $d p [2^{n} - 1] = 0$ (every bucket closed exactly). This is $O (2^{n} n)$ , the same shape as assignment: $2^{n}$ masks, $n$ candidate elements per mask, $O (1)$ feasibility check.

partition_k_equal_sum.pypython

from typing import Optional, Sequence

def can_partition_k_equal_sum(numbers: Sequence[int], buckets: int) -> bool:
  """
    Whether `numbers` can be split into exactly `buckets` groups of\n
    equal sum. Numbers are non-negative; `buckets` is at least 1.\n
  """
  if buckets <= 0:
    return False

  total: int = sum(numbers)
  if buckets == 1:
    return True
  if total % buckets != 0:
    return False

  target: int = total // buckets

  # An element heavier than a bucket can never fit anywhere.
  if any(value > target for value in numbers):
    return False

  element_count: int = len(numbers)
  full_mask: int = (1 << element_count) - 1

  # bucket_fill[mask] = used capacity of the current bucket once the
  # elements in `mask` are placed; None marks `mask` as unreachable.
  bucket_fill: list[Optional[int]] = [None for _ in range(1 << element_count)]
  bucket_fill[0] = 0

  for mask in range(1 << element_count):

    # skip subsets we never reached.
    fill: Optional[int] = bucket_fill[mask]
    if fill is None:
      continue

    # try adding each unused element that still fits the current bucket.
    for index in range(element_count):
      if (mask >> index) & 1 or fill + numbers[index] > target:
        continue

      # closing a bucket exactly rolls the fill over to zero.
      next_mask: int = mask | (1 << index)
      bucket_fill[next_mask] = (fill + numbers[index]) % target

  return bucket_fill[full_mask] == 0

Submask enumeration and the $3^{n}$ bound

Some subset DPs need, for each mask, to consider every way of splitting it into two complementary parts, for instance partitioning a set of cities into groups each served by one route. That requires iterating over all submasks of a given mask. The idiom is:

Algorithm:enumerate every submask

\text{sub} \subseteq \text{mask}

1
$\text{sub} \gets \text{mask}$
2
while $\text{sub} > 0$ do
3
// ... use sub and its complement ...
4
$\text{sub} \gets (\text{sub} - 1) \mathbin{\&} \text{mask}$
5
// empty submask 0 handled after loop

Subtracting $1$ from sub borrows through its trailing zeros; & mask then snaps the result back inside mask, so the sequence steps through the submasks in strictly decreasing order down to $0$ . The total cost across all masks is smaller than the naive bound suggests:

a mask, its submasks via the

(sub - 1) & mask

step, and the three roles each element plays

So a DP that, for every mask, loops over all its submasks runs in $O (3^{n})$ , not the $O (4^{n})$ a naive all masks $\times$ all masks bound would suggest. Find the Shortest Superstring and Minimum Cost to Connect Two Groups of Points are submask/mask DPs of this flavor: the latter builds the answer by, for each mask of right-group points, choosing how to cover it given a left point. When the transition is a sum (or min/max) over all submasks of each mask with a fixed contribution per submask, there is an even faster $O (2^{n} n)$ technique, SOS DP (sum over subsets), covered in the DP-optimizations lesson, that beats the $3^{n}$ enumeration by sharing work across masks.²

submask_enumeration.pypython

from math import inf
from typing import Callable, Iterator

# A cost charged to a single group, given as its bitmask of members.
GroupCost = Callable[[int], float]

def submasks(mask: int) -> Iterator[int]:
  """
    Yield every submask of `mask`, including `mask` itself and the empty\n
    submask 0, in strictly decreasing order.\n
  """
  sub: int = mask
  while sub > 0:
    yield sub
    sub = (sub - 1) & mask
  yield 0

def proper_nonempty_submasks(mask: int) -> Iterator[int]:
  """
    Yield every submask of `mask` that is neither empty nor all of\n
    `mask` — useful when a split must put something in both parts.\n
  """
  for sub in submasks(mask):
    if sub != 0 and sub != mask:
      yield sub

def count_submask_pairs(element_count: int) -> int:
  """
    The total number of (mask, submask) pairs over all masks of an\n
    `element_count`-element set, computed by direct enumeration. It\n
    equals 3 ** element_count, the bound the enumeration achieves.\n
  """
  # count every submask of every mask by direct enumeration.
  return sum(
    1 for mask in range(1 << element_count) for _ in submasks(mask)
  )

def minimum_partition_cost(
  element_count: int,
  group_cost: GroupCost,
) -> float:
  """
    Partition the ground set {0, .., element_count-1} into one or more\n
    disjoint groups, minimizing the summed `group_cost` over the groups.\n
    Each group is identified by its bitmask. This is the canonical\n
    submask DP and runs in O(3^n) by enumerating, for each mask, the\n
    ways to peel off one group as a submask.\n
  """
  full_mask: int = (1 << element_count) - 1

  # best_cost[mask] = least total cost of partitioning exactly `mask`.
  best_cost: list[float] = [inf for _ in range(1 << element_count)]
  best_cost[0] = 0.0

  for mask in range(1, 1 << element_count):

    # anchor on the lowest set bit so every partition is counted once.
    lowest_bit: int = mask & -mask
    sub: int = mask

    # peel off the group holding that bit, costing it plus the remainder.
    while sub > 0:
      if sub & lowest_bit:
        remaining: int = mask ^ sub
        candidate: float = group_cost(sub) + best_cost[remaining]
        best_cost[mask] = min(best_cost[mask], candidate)
      sub = (sub - 1) & mask

  return best_cost[full_mask]

The reach and the ceiling of subset DP

Held and Karp published their tour DP in 1962 (Held and Karp, A dynamic programming approach to sequencing problems, J. SIAM 10), and its $O (2^{n} n^{2})$ bound has never been beaten in the worst case — sixty years on, no $O ((2 - ε)^{n})$ exact TSP algorithm is known, and finding one would be a major result. Whether it can be beaten is tied to the Strong Exponential Time Hypothesis (SETH); under SETH, several natural covering problems, including some subset DPs here, have no meaningfully faster exact algorithm, so the exponential is likely unavoidable.³

There is one large class where the $2^{n}$ ceiling drops. When the subset DP is a DP over a graph of small pathwidth or treewidth, the mask need only track the boundary between processed and unprocessed vertices rather than all vertices; Maximum Students Taking Exam exploits this, keeping only the previous row's seating mask. The general statement is Bodlaender's theorem and the connectivity DPs over tree decompositions, and the modern refinement is Cut & Count (Cygan, Nederlof, Pilipczuk, and coauthors, 2011), which uses randomization and the isolation lemma to solve connectivity problems like Hamiltonicity in $O^{*} (c^{tw})$ for a small constant $c$ — a better base than the naive $2^{tw} \cdot tw$ subset enumeration.

The submask/superset machinery in the last section is the entry point to the subset-sum / subset-convolution toolkit. The $3^{n}$ submask enumeration is the brute-force version; the SOS transform (the DP-optimizations lesson) drops the sum over all submasks case to $O (2^{n} n)$ , and Björklund, Husfeldt, Kaski, and Koivisto's fast subset convolution (2007) composes two set functions in $O (2^{n} n^{2})$ , which is what makes exact graph coloring runnable in $O^{*} (2^{n})$ rather than $O^{*} (3^{n})$ . Beyond these, the practical frontier is heuristic: real routing and scheduling at scale use branch-and-cut on integer programs (the Concorde TSP solver has proved optimal tours on tens of thousands of cities) and metaheuristics like Lin–Kernighan, none of which change the worst-case exponent but all of which make the exponential tractable on the instances that actually arise.

Takeaways

Bitmask DP applies when a subproblem's state is which subset of a small ground set ( $n \leq\sim 20$ ) has been used; encode the subset as the bits of an integer and index the table by that mask, giving $2^{n}$ states.
The core bit operations (test (mask >> i) & 1, set mask | (1<<i), clear mask & ~(1<<i), lowest bit mask & -mask, popcount) are all $O (1)$ , and iterating masks in increasing order is a valid topological order for subset DPs.
Held–Karp solves TSP with $d p [mask] [i]$ = best path visiting mask ending at $i$ , in $O (2^{n} n^{2})$ time and $O (2^{n} n)$ space versus $Θ (n!)$ brute force, and is Shortest Path Visiting All Nodes.
Assignment and subset-sum partitioning use $d p [mask]$ alone in $O (2^{n} n)$ , exploiting $popcount (mask)$ to recover the implicit position or bucket state.
Submask enumeration via sub = (sub - 1) & mask iterates every submask of every mask in total $O (3^{n})$ , because each element is independently in sub, in the complement, or in neither.

Erickson, Ch. — Dynamic Programming: exponential subset DPs trade $Θ (n!)$ exhaustive search for $O (2^{n} \cdot poly (n))$ by memoizing over subsets. ↩
CLRS, Ch. 15 — Dynamic Programming (§15.3): optimal substructure and overlapping subproblems — the two ingredients that justify indexing subproblems by subset masks. ↩
Held & Karp (1962, J. SIAM 10) for the $O (2^{n} n^{2})$ tour DP; Cygan, Nederlof, Pilipczuk, Pilipczuk, van Rooij, Wojtaszczyk (2011, FOCS), Solving connectivity problems parameterized by treewidth in single exponential time (Cut & Count); Björklund et al. (2007, STOC) fast subset convolution enabling $O^{*} (2^{n})$ exact graph coloring. ↩

Bit tricks, compactly

Held–Karp: the traveling salesman archetype

A worked instance

Assignment and matching by mask

Subset-sum partitioning over masks

Submask enumeration and the 3n bound

The reach and the ceiling of subset DP

Takeaways

Footnotes

Submask enumeration and the $3^{n}$ bound