Dynamic Programming/Knapsack & Subset Problems

Lesson 8.42,288 words

Knapsack & Subset Problems

We start from Subset-sum\textsc{Subset-sum} — does some sublist hit a target tt? — and its include/exclude recurrence over a boolean table A(i,u)A(i, u), then bolt on values to get 0/1 knapsack as the same machine with \lor promoted to max\max.

╌╌╌╌

A thief breaks into a warehouse carrying a knapsack that holds at most units of weight. Item weighs and is worth . Each item is taken whole or left behind, with no fractions and no duplicates. Which subset maximizes the value carried out without exceeding the capacity? This is the 0/1 knapsack problem, and despite its toy framing it is one of the most important problems in computing: it is the kernel of resource allocation, budgeting, and cutting-stock problems, and a canonical optimization problem whose dynamic program teaches a subtle lesson about what polynomial really means.1

We will arrive at knapsack through its simpler decision cousin, , which strips away the values and asks a plain yes/no question. The include/exclude recurrence is identical, the running-time subtlety is identical, and subset-sum is the cleanest place to see both.

Subset-sum: the decision core

The brute-force space is the sublists. To get a recurrence we shrink the instance one element at a time, and the trick that supplies the second dimension is to make the target itself a parameter of the subproblem.

The answer we want is . Now look at , the last element we are allowed to use, and make the include/exclude decision that defines every 0/1 dynamic program:

  • Exclude . Then some sublist of the first elements must already hit on its own: .
  • Include . Then it contributes toward the target, and the first elements must cover the shortfall: .

The element can be used in either way, so the subproblem is true if either branch succeeds, a logical . The base cases pin down the boundary: a budget of is always reachable by the empty set; a positive budget is unreachable with no elements; a negative budget is impossible:

Every entry depends only on two entries of the previous row (): the one directly above (exclude) and the one above and columns to the left (include). So we fill the table row by row in increasing , sweeping the budget within each row.

The subset-sum table, filled

Take and target . The table holds the boolean : row is only in column (the empty set sums to ), and each later row ORs the exclude cell directly above with the include cell columns to its left. The shaded cell is the answer; the two arrows show its include/exclude dependency.

Boolean subset-sum table with the answer cell and its include and exclude predecessors.

The answer is , witnessed by or . Reading its two dashed predecessors: exclude asks (no subset of hits ), while include asks ; the makes the cell true through the include branch.

Subset-sum in pseudocode

Algorithm 1:Subset-Sum(L[1..n],t)\textsc{Subset-Sum}(L[1..n], t) — does some sublist sum to tt?
  1. 1
    A[0..n][0..t]falseA[0..n][0..t] \gets \text{false}
  2. 2
    A[0..n][0]trueA[0..n][0] \gets \text{true}
    empty set reaches 00
  3. 3
    for i1i \gets 1 to nn do
  4. 4
    for u1u \gets 1 to tt do
  5. 5
    if ai>ua_i > u then
  6. 6
    A[i][u]A[i1][u]A[i][u] \gets A[i-1][u]
    aia_i too big: exclude
  7. 7
    else
  8. 8
    A[i][u]A[i1][u]A[i1][uai]A[i][u] \gets A[i-1][u] \lor A[i-1][u - a_i]
    exclude or include
  9. 9
    return A[n][t]A[n][t]

This fills boolean cells in apiece, for time and space, and the space drops to because each row reads only the one above it. We will return to that innocent-looking below, since it hides a trap. First, let us see that 0/1 knapsack is the same machine with values bolted on.

The problem and why greed fails

Knapsack is subset-sum with two upgrades: each item carries a separate value as well as a weight , and the capacity is an upper bound rather than an exact target, so we maximize value instead of answering yes/no. The include/exclude decision is unchanged: the of subset-sum becomes a , and the boolean cell becomes a value. (Set and the answer recovers subset-sum exactly, a reduction we make precise below.)

The tempting greedy idea, taking items in decreasing order of value-per-weight ratio , is wrong for the 0/1 problem. Suppose and the items are , , . Item has the best ratio ( per unit) so greed grabs it first, then nothing else fits, for a total value of . But taking and together fills the knapsack exactly for value . Greed is fooled because a locally efficient item can block a globally better combination; the 0/1 problem lacks the structure that lets greedy choices succeed on matroids. We need to consider subsets, and that calls for dynamic programming.

Why greedy fails for 0/1 knapsack (). Best ratio first grabs () and then nothing fits: value . Taking and instead fills capacity exactly for value .

The high-ratio item is a trap: it is locally efficient yet blocks the pair that packs the knapsack with no waste.

The subproblem and recurrence

The brute-force space is the subsets. To get a recurrence we need a subproblem definition that shrinks the instance one decision at a time. The key insight, the source of the second dimension, is that we must track not only which items remain available but how much capacity is left.

The answer is . Now consider item , the last one we are allowed to use, and make the same include/exclude decision as in subset-sum; this is the 0/1 in the name.

Exclude item . Then the best we can do is whatever the first items achieve within the same budget: .

Include item . This is only possible if it fits, . We collect its value and spend of the budget, leaving for the first items: .

We take the better of the two, and if item does not fit, only the first option is available:2

The base case says: with no items, or no capacity, the value is . Each entry depends only on entries in the previous row (), so we fill the table row by row in increasing , and within each row over all budgets .

The DP table, filled

Take capacity and four items: , , , . The table holds ; row is all zeros (no items), and each later row applies the recurrence across budgets through .

Filled 0/1 knapsack value table with include and exclude arrows into the answer.

The shaded answer is . The arrows trace the highlighted entry : item weighs , so we compare excluding it (, the cell directly above) against including it (, reaching columns to the left); the include branch wins at . One row down, compares excluding item () against including it (), a tie at .

The algorithm

Algorithm 2:Knapsack-01(w,v,n,W)\textsc{Knapsack-01}(w, v, n, W) — maximum value within capacity WW
  1. 1
    for b0b \gets 0 to WW do
  2. 2
    K[0][b]0K[0][b] \gets 0
    no items: value 00
  3. 3
    for i1i \gets 1 to nn do
  4. 4
    for b0b \gets 0 to WW do
  5. 5
    K[i][b]K[i1][b]K[i][b] \gets K[i-1][b]
    skip item ii
  6. 6
    if w[i]bw[i] \le b then
  7. 7
    takev[i]+K[i1][bw[i]]take \gets v[i] + K[i-1][b - w[i]]
  8. 8
    K[i][b]max(K[i][b], take)K[i][b] \gets \max(K[i][b],\ take)
    take item ii
  9. 9
    return K[n][W]K[n][W]

Recovering the chosen items. As with every DP, the table holds the value; the subset is recovered by walking backward from . At row , if then item was taken: record it and drop the budget to ; otherwise it was skipped. Continue down to row .

Algorithm 3:Knapsack-Items(K,w,n,W)\textsc{Knapsack-Items}(K, w, n, W) — recover the optimal subset
  1. 1
    SS \gets \emptyset
  2. 2
    bWb \gets W
  3. 3
    for ini \gets n downto 11 do
  4. 4
    if K[i][b]K[i1][b]K[i][b] \neq K[i-1][b] then
  5. 5
    SS{i}S \gets S \cup \set{i}
    item ii taken
  6. 6
    bbw[i]b \gets b - w[i]
  7. 7
    return SS

On the filled table above (), the walk starts at and reads off one decision per row, dropping the budget by whenever an item is taken:

Backward reconstruction from . At each row, means item was skipped (budget unchanged); a jump means it was taken (budget drops by ). Items and are recovered, weight , value .

Running time and the pseudo-polynomial trap

The table has entries, each filled in , so runs in

time and space. (Space drops to if only the value is needed, since each row reads only the previous one, so scan from high to low to reuse a single array.)

The descending sweep is not a detail but the whole correctness argument for the 1-D version: when we apply an item, must still hold the value from before this item was available, so we must reach before we overwrite — that is, go high to low.

0/1 knapsack on one rolling array, applying item () with the budget swept (descending). Each update reads before it is overwritten, so item is used at most once.

Sweeping the other way, low to high, would let already include item , silently packing it twice — exactly the reuse that the unbounded knapsackwants, and that the ascending sweep there deliberately permits.

That looks polynomial, and here lies one of the most important subtleties in all of algorithms.

Is this really polynomial time? It is tempting to call (or ) polynomial: the input is a list of numbers, so surely its size is , and is polynomial in . That accounting is wrong, and seeing why is what matters here. We have to count the input's size in bits, not in numbers. This is the same bit-length accounting that underlies asymptotic analysis.

Now rewrite the running time against . Since can be as large as ,

which is exponential in , the number of bits of a single input integer. Doubling the bits used to write the target squares and thus squares the running time. An algorithm whose time is polynomial in the numeric value of an input but exponential in its encoded length is called pseudo-polynomial: it counts as polynomial only if we dishonestly pretend an integer has size rather than its true size .3 Such running times are central to coping with hardness.

So the DP is not a flaw in our analysis but the best we currently know how to do. It is fast and practical precisely when the numbers are small (a target in the thousands, say), and uselessly slow when is a -bit integer, even though both instances have the same handful of elements. The lesson generalizes: any DP whose table is indexed by a numeric quantity (a target sum, a capacity) inherits this pseudo-polynomial character. The same pattern recurs in coin change and the unbounded knapsack.

Subset-sum as a special case

We opened with subset-sum and built knapsack on top of it; the reduction also runs the other way, and making it precise shows the two problems are the same machine. Take any subset-sum instance with target , and feed it to 0/1 knapsack with each item's value equal to its weight, , and capacity . The most value you can pack into a capacity- knapsack is at most , and it equals exactly when some subset of weights sums to without waste. So

and one call to answers subset-sum. The two recurrences are the same shape with the of the boolean table promoted to a over values, which is why everything transfers: time, pseudo-polynomial, and / NP-hard in general.

Where greed does work: fractional knapsack

Change one rule, letting items be split so that we may take any fraction of item , and the problem fractional knapsack becomes easy, solvable by the very greedy strategy that failed the 0/1 version. Sort items by value-per-weight ratio and take them greedily, highest ratio first, slicing the last item to fill the knapsack exactly.

Algorithm 4:Fractional-Knapsack(w,v,n,W)\textsc{Fractional-Knapsack}(w, v, n, W) — greedy, fractions allowed
  1. 1
    sort items so that v[1]/w[1]v[2]/w[2]v[n]/w[n]v[1]/w[1] \ge v[2]/w[2] \ge \cdots \ge v[n]/w[n]
  2. 2
    value0value \gets 0; bWb \gets W
    remaining capacity
  3. 3
    for i1i \gets 1 to nn do
  4. 4
    if w[i]bw[i] \le b then
  5. 5
    valuevalue+v[i]value \gets value + v[i]
    take all of item ii
  6. 6
    bbw[i]b \gets b - w[i]
  7. 7
    else
  8. 8
    valuevalue+v[i](b/w[i])value \gets value + v[i] \cdot (b / w[i])
    fraction fills exactly
  9. 9
    return valuevalue
  10. 10
    return valuevalue

This runs in , since the sort dominates, and is provably optimal by an exchange argument: any optimal solution can be rearranged, without losing value, to match the greedy choice. The contrast is the moral of the lesson.

Takeaways

  • and 0/1 knapsack share one engine: a two-dimensional subproblem ( / ) indexed by items available and budget remaining, because how much budget is left is part of the state.
  • The recurrence is the include/exclude choice on the last element: a of two previous-row cells for subset-sum, a for knapsack. Each cell depends on the one directly above (exclude) and the one / columns to the left (include); the optimal subset is recovered by walking backward.
  • / is pseudo-polynomial: polynomial in the numeric value of the target, but with an honest size of bits it is , exponential in the bit length of one integer.
  • A truly polynomial algorithm is unlikely: is , so one exists only if .
  • Fractional knapsack flips to a greedy algorithm: allowing fractions restores the greedy-choice property that indivisibility destroys, so one rule change crosses the line from NP-hard to easy.

Footnotes

  1. Skiena, §10 — Dynamic Programming: 0/1 knapsack as a canonical NP-hard resource-allocation problem solved by DP.
  2. CLRS, Ch. 15 — Dynamic Programming: the 0/1 knapsack value recurrence taking the max of exclude and include.
  3. Erickson, Ch. 3 — Dynamic Programming: the pseudo-polynomial running time, exponential in the input's bit length.
Practice

╌╌ END ╌╌