Knapsack & Subset Problems
We start from — does some sublist hit a target ? — and its include/exclude recurrence over a boolean table , then bolt on values to get 0/1 knapsack as the same machine with promoted to .
╌╌╌╌
A thief breaks into a warehouse carrying a knapsack that holds at most units
of weight. Item weighs and is worth . Each item is taken whole or
left behind, with no fractions and no duplicates. Which subset maximizes the value
carried out without exceeding the capacity? This is the 0/1 knapsack problem,
and despite its toy framing it is one of the most important problems in
computing: it is the kernel of resource allocation, budgeting, and cutting-stock
problems, and a canonical optimization problem whose dynamic program
teaches a subtle lesson about what polynomial
really means.1
We will arrive at knapsack through its simpler decision cousin, , which strips away the values and asks a plain yes/no question. The include/exclude recurrence is identical, the running-time subtlety is identical, and subset-sum is the cleanest place to see both.
Subset-sum: the decision core
The brute-force space is the sublists. To get a recurrence we shrink the instance one element at a time, and the trick that supplies the second dimension is to make the target itself a parameter of the subproblem.
The answer we want is . Now look at , the last element we are allowed to use, and make the include/exclude decision that defines every 0/1 dynamic program:
- Exclude . Then some sublist of the first elements must already hit on its own: .
- Include . Then it contributes toward the target, and the first elements must cover the shortfall: .
The element can be used in either way, so the subproblem is true if either branch succeeds, a logical . The base cases pin down the boundary: a budget of is always reachable by the empty set; a positive budget is unreachable with no elements; a negative budget is impossible:
Every entry depends only on two entries of the previous row (): the one directly above (exclude) and the one above and columns to the left (include). So we fill the table row by row in increasing , sweeping the budget within each row.
The subset-sum table, filled
Take and target . The table holds the boolean : row is only in column (the empty set sums to ), and each later row ORs the exclude cell directly above with the include cell columns to its left. The shaded cell is the answer; the two arrows show its include/exclude dependency.
The answer is , witnessed by or . Reading its two dashed predecessors: exclude asks (no subset of hits ), while include asks ; the makes the cell true through the include branch.
Subset-sum in pseudocode
- 1
- 2empty set reaches
- 3for to do
- 4for to do
- 5if then
- 6too big: exclude
- 7else
- 8exclude or include
- 9return
This fills boolean cells in apiece, for time and space, and the space drops to because each row reads only the one above it. We will return to that innocent-looking below, since it hides a trap. First, let us see that 0/1 knapsack is the same machine with values bolted on.
The problem and why greed fails
Knapsack is subset-sum with two upgrades: each item carries a separate value as well as a weight , and the capacity is an upper bound rather than an exact target, so we maximize value instead of answering yes/no. The include/exclude decision is unchanged: the of subset-sum becomes a , and the boolean cell becomes a value. (Set and the answer recovers subset-sum exactly, a reduction we make precise below.)
The tempting greedy idea, taking items in decreasing order of value-per-weight ratio , is wrong for the 0/1 problem. Suppose and the items are , , . Item has the best ratio ( per unit) so greed grabs it first, then nothing else fits, for a total value of . But taking and together fills the knapsack exactly for value . Greed is fooled because a locally efficient item can block a globally better combination; the 0/1 problem lacks the structure that lets greedy choices succeed on matroids. We need to consider subsets, and that calls for dynamic programming.
The high-ratio item is a trap: it is locally efficient yet blocks the pair that packs the knapsack with no waste.
The subproblem and recurrence
The brute-force space is the subsets. To get a recurrence we need a subproblem definition that shrinks the instance one decision at a time. The key insight, the source of the second dimension, is that we must track not only which items remain available but how much capacity is left.
The answer is . Now consider item , the last one we are allowed to
use, and make the same include/exclude decision as in subset-sum; this is the
0/1
in the name.
Exclude item . Then the best we can do is whatever the first items achieve within the same budget: .
Include item . This is only possible if it fits, . We collect its value and spend of the budget, leaving for the first items: .
We take the better of the two, and if item does not fit, only the first option is available:2
The base case says: with no items, or no capacity, the value is . Each entry depends only on entries in the previous row (), so we fill the table row by row in increasing , and within each row over all budgets .
The DP table, filled
Take capacity and four items: , , , . The table holds ; row is all zeros (no items), and each later row applies the recurrence across budgets through .
The shaded answer is . The arrows trace the highlighted entry : item weighs , so we compare excluding it (, the cell directly above) against including it (, reaching columns to the left); the include branch wins at . One row down, compares excluding item () against including it (), a tie at .
The algorithm
- 1for to do
- 2no items: value
- 3for to do
- 4for to do
- 5skip item
- 6if then
- 7
- 8take item
- 9return
Recovering the chosen items. As with every DP, the table holds the value; the subset is recovered by walking backward from . At row , if then item was taken: record it and drop the budget to ; otherwise it was skipped. Continue down to row .
- 1
- 2
- 3for downto do
- 4if then
- 5item taken
- 6
- 7return
On the filled table above (), the walk starts at and reads off one decision per row, dropping the budget by whenever an item is taken:
Running time and the pseudo-polynomial trap
The table has entries, each filled in , so runs in
time and space. (Space drops to if only the value is needed, since each row reads only the previous one, so scan from high to low to reuse a single array.)
The descending sweep is not a detail but the whole correctness argument for the 1-D version: when we apply an item, must still hold the value from before this item was available, so we must reach before we overwrite — that is, go high to low.
Sweeping the other way, low to high, would let already include item , silently packing it twice — exactly the reuse that the unbounded knapsackwants, and that the ascending sweep there deliberately permits.
That looks polynomial, and here lies one of the most important subtleties in all of algorithms.
Is this really polynomial time
? It is tempting to call (or
) polynomial: the input is a list of numbers, so surely its
size is , and is polynomial in . That accounting is
wrong, and seeing why is what matters here. We have to count the input's size in
bits, not in numbers. This is the same bit-length accounting that underlies
asymptotic analysis.
Now rewrite the running time against . Since can be as large as ,
which is exponential in , the number of bits of a single input integer.
Doubling the bits used to write the target squares and thus squares the
running time. An algorithm whose time is polynomial in the numeric value of an
input but exponential in its encoded length is called pseudo-polynomial: it
counts as polynomial
only if we dishonestly pretend an integer has size
rather than its true size .3 Such running
times are central to
coping with hardness.
So the DP is not a flaw in our analysis but the best we currently know how to do. It is fast and practical precisely when the numbers are small (a target in the thousands, say), and uselessly slow when is a -bit integer, even though both instances have the same handful of elements. The lesson generalizes: any DP whose table is indexed by a numeric quantity (a target sum, a capacity) inherits this pseudo-polynomial character. The same pattern recurs in coin change and the unbounded knapsack.
Subset-sum as a special case
We opened with subset-sum and built knapsack on top of it; the reduction also runs the other way, and making it precise shows the two problems are the same machine. Take any subset-sum instance with target , and feed it to 0/1 knapsack with each item's value equal to its weight, , and capacity . The most value you can pack into a capacity- knapsack is at most , and it equals exactly when some subset of weights sums to without waste. So
and one call to answers subset-sum. The two recurrences are the same shape with the of the boolean table promoted to a over values, which is why everything transfers: time, pseudo-polynomial, and / NP-hard in general.
Where greed does work: fractional knapsack
Change one rule, letting items be split so that we may take any fraction of item , and the problem fractional knapsack becomes easy, solvable by the very greedy strategy that failed the 0/1 version. Sort items by value-per-weight ratio and take them greedily, highest ratio first, slicing the last item to fill the knapsack exactly.
- 1sort items so that
- 2;remaining capacity
- 3for to do
- 4if then
- 5take all of item
- 6
- 7else
- 8fraction fills exactly
- 9return
- 10return
This runs in , since the sort dominates, and is provably optimal by an exchange argument: any optimal solution can be rearranged, without losing value, to match the greedy choice. The contrast is the moral of the lesson.
Takeaways
- and 0/1 knapsack share one engine: a two-dimensional subproblem ( / ) indexed by items available and budget remaining, because how much budget is left is part of the state.
- The recurrence is the include/exclude choice on the last element: a of two previous-row cells for subset-sum, a for knapsack. Each cell depends on the one directly above (exclude) and the one / columns to the left (include); the optimal subset is recovered by walking backward.
- / is pseudo-polynomial: polynomial in the numeric value of the target, but with an honest size of bits it is , exponential in the bit length of one integer.
- A truly polynomial algorithm is unlikely: is , so one exists only if .
- Fractional knapsack flips to a greedy algorithm: allowing fractions restores the greedy-choice property that indivisibility destroys, so one rule change crosses the line from NP-hard to easy.
Footnotes
- Skiena, §10 — Dynamic Programming: 0/1 knapsack as a canonical NP-hard resource-allocation problem solved by DP. ↩
- CLRS, Ch. 15 — Dynamic Programming: the 0/1 knapsack value recurrence taking the max of exclude and include. ↩
- Erickson, Ch. 3 — Dynamic Programming: the pseudo-polynomial running time, exponential in the input's bit length. ↩
╌╌ END ╌╌