Dynamic Programming/DP Optimizations

Lesson 8.91,418 words

DP Optimizations

A correct DP recurrence is only half the battle; its naive evaluation is often a factor of nn slower than necessary. This capstone surveys five techniques, monotonic-queue, the convex hull trick, divide-and-conquer optimization, Knuth's optimization, and SOS DP, that each exploit structure in the transition (a sliding window, linear costs, monotone optimal splits, the quadrangle inequality, or subset lattices) to shave an O(n)O(n), O(logn)O(\log n), or worse factor off the running time.

╌╌╌╌

The previous lessons built DP recurrences and trusted their dimensions to give the running time: a table of states, each filled by a transition that scans predecessors, costs . Often is itself , a min over all earlier states or a split point ranging over an interval, and the honest recurrence runs in or . The techniques in this lesson all share one move: they observe that the transition is not an arbitrary min over predecessors but one with structure, and they maintain an auxiliary object (a deque, a hull of lines, a monotone split pointer) that answers each transition faster than a fresh scan.1 None of them changes what the DP computes, only how fast it computes it.

A useful test runs through the whole lesson: look at the shape of the inner min/max and ask what stays constant and what slides as the outer index advances. The answer names the technique.

The cost model here is the state-count times transition-work product; these techniques attack the second factor.

Monotonic-queue optimization

Consider a transition of the form

where each state takes a min (or max) of the previous states within a sliding window of width , then adds a term depending only on . Evaluated directly this is : every state rescans its window. But the window's left edge only ever moves right, and its right edge only ever moves right, so this is exactly the sliding-window minimum problem, which a monotonic deque solves in amortized per step.2

Algorithm:MonoQueue-DP\textsc{MonoQueue-DP} — evaluate dp[i]=minikj<idp[j]+cost(i)dp[i]=\min_{i-k\le j<i}dp[j]+\text{cost}(i) in O(n)O(n)
  1. 1
    QQ \gets empty deque of indices
    dpdp increasing front\toback
  2. 2
    dp[0]basedp[0] \gets \text{base}; push 00 onto QQ
  3. 3
    for i1i \gets 1 to nn do
  4. 4
    while QQ nonempty and front(Q)<ikfront(Q) < i-k do
  5. 5
    pop front of QQ
    left the window
  6. 6
    dp[i]dp[front(Q)]+cost(i)dp[i] \gets dp[front(Q)] + \text{cost}(i)
  7. 7
    while QQ nonempty and dp[back(Q)]dp[i]dp[back(Q)] \ge dp[i] do
  8. 8
    pop back of QQ
    ii dominates older candidate
  9. 9
    push ii onto QQ

Each index is pushed and popped at most once, so the total work is , down from . Jump Game VI is the canonical instance: is the best score reachable at index , equal to of for in the last positions, plus : a sliding-window max, the same deque with the inequality flipped. Constrained Subsequence Sum is the same recurrence with and a reset. This is the monotonic-stack idea from the sequences module, extended to a deque so that both ends move.

a sliding window over the row; the deque front holds the window optimum

Convex hull trick

Now suppose each previous state contributes a line and the transition queries the best line at a point:

where the slope and intercept depend only on (typically is a function of and the problem data), and depends only on . The naive evaluation is . But over a fixed set of lines, as a function of , is the lower envelope of those lines, a convex, piecewise-linear curve. Only the lines on the lower hull can ever be optimal; the rest are dominated everywhere. Maintaining that hull and querying it is the Convex Hull Trick.1

each state is a line; the optimal is the lower hull queried at

If lines are inserted in monotone slope order and queries are also monotone, both operations are amortized: push lines onto a stack-like hull, popping any that the newcomer makes redundant, and advance a pointer for queries. Without monotonicity, store the hull and binary-search for the optimal line at in , or use a Li Chao tree. Either way the DP drops from to .

Divide-and-conquer optimization

For a layered transition

let be the smallest achieving that minimum. If is monotone in (that is, for every fixed layer ), then the search range for column is bounded by the answers of its neighbors, and we can solve a whole layer by divide and conquer:

Algorithm:DC-Opt(i,jlo,jhi,klo,khi)\textsc{DC-Opt}(i, j_{lo}, j_{hi}, k_{lo}, k_{hi}) — fill layer ii, columns [jlo,jhi][j_{lo},j_{hi}]
  1. 1
    if jlo>jhij_{lo} > j_{hi} then return
  2. 2
    jmid(jlo+jhi)/2j_{mid} \gets \lfloor (j_{lo}+j_{hi})/2 \rfloor
  3. 3
    bestbest \gets \infty; optkloopt \gets k_{lo}
  4. 4
    for kklok \gets k_{lo} to min(jmid1,khi)\min(j_{mid}-1,\,k_{hi}) do
  5. 5
    if dp[i1][k]+C(k,jmid)<bestdp[i-1][k] + C(k,j_{mid}) < best then
  6. 6
    bestdp[i1][k]+C(k,jmid)best \gets dp[i-1][k] + C(k,j_{mid}); optkopt \gets k
  7. 7
    dp[i][jmid]bestdp[i][j_{mid}] \gets best
  8. 8
    DC-Opt(i, jlo, jmid1, klo, opt)\textsc{DC-Opt}(i,\ j_{lo},\ j_{mid}-1,\ k_{lo},\ opt)
  9. 9
    DC-Opt(i, jmid+1, jhi, opt, khi)\textsc{DC-Opt}(i,\ j_{mid}+1,\ j_{hi},\ opt,\ k_{hi})

Solve the middle column first by scanning its full allowed -range; its optimum then caps the left half's search and floors the right half's, so the two recursive calls split both the columns and the candidate range:

Divide-and-conquer optimization. Solving finds ; by monotonicity the left columns search only and the right only , halving the column span while the -ranges overlap only at .

At each recursion depth the -ranges across all sub-calls overlap by at most their endpoints, so one depth costs ; there are depths per layer and layers, giving instead of . The monotonicity of is the hypothesis you must verify; it holds whenever satisfies the quadrangle inequality (below), but is sometimes provable directly from the problem.

Knuth's optimization

Interval DPs have the shape

and naively cost : intervals, each scanning split points. Knuth's optimization applies when satisfies the quadrangle inequality (QI) and is monotone on intervals:

When QI holds, the optimal split point is monotone in both arguments:

So when filling we only scan split points in rather than all of . Summed over a fixed interval length, those ranges telescope, and the total work collapses to . This is the optimization behind optimal binary search trees and the cost-merging part of matrix-chain multiplication from the interval-DP lesson: both have cost functions satisfying QI, so each yields to Knuth and runs in .

Knuth's optimization: filling scans only , a window pinned by two already-known optimal splits, not all of

SOS DP (sum over subsets)

The last technique is combinatorial rather than geometric. Given a value for every bitmask over bits, we want, for each mask , an aggregate over all of its submasks:

Enumerating every submask of every mask costs (the classic submask-enumeration bound). Sum over subsets does it in by adding one bit-dimension at a time: process bits , and when processing bit , fold each mask that has bit set into the version without it. It is a multidimensional prefix sum over the hypercube .

SOS DP folds one bit at a time over the hypercube : when processing bit , each mask with bit set absorbs along that axis
Algorithm:SOS\textsc{SOS} — submask sums for all masks in O(n2n)O(n\,2^n)
  1. 1
    for m0m \gets 0 to 2n12^n-1 do
  2. 2
    g[m]f[m]g[m] \gets f[m]
  3. 3
    for b0b \gets 0 to n1n-1 do
  4. 4
    for m0m \gets 0 to 2n12^n-1 do
  5. 5
    if m&(1b)0m \mathbin{\&} (1 \ll b) \ne 0 then
  6. 6
    g[m]g[m]+g[m(1b)]g[m] \gets g[m] + g[m \oplus (1 \ll b)]

After processing bit , holds the sum of over all submasks of that differ from only in bits ; after all bits, it is the full submask sum. Replacing the order of the two loops, or flipping the bit test, gives sums over supersets instead. This is what drives the bitmask-DP lesson's harder counting problems: anything that asks you to aggregate over all subsets of every state at once.

Choosing the technique

TechniqueTransition shapeComplexity win
Monotonic queue (sliding window)
Convex hull trick (line per state)
Divide & conquer, monotone
Knuth, QI
SOS DP (submask aggregate)

Takeaways

  • These are not new DPs but faster evaluations of an existing recurrence; the trigger is always structure in the inner min/max, not its mere size.
  • Monotonic-queue optimization: a sliding-window min/max transition runs in via a monotonic deque whose front is the window optimum — the natural tool for Jump Game VI and Constrained Subsequence Sum.
  • Convex hull trick: when each state is a line and the transition queries the lower envelope at , maintain the hull and query in (or amortized when slopes and queries are monotone), turning into .
  • Divide-and-conquer optimization needs a monotone optimal split; Knuth's optimization gets that monotonicity for interval DPs from the quadrangle inequality, dropping to on problems like optimal BST and matrix-chain.
  • SOS DP aggregates over every submask of every mask in by summing one bit-dimension at a time — a prefix sum over the subset lattice.
  • Verify the applicability condition (window monotonicity, linear cost, monotone split, QI) before reaching for the speedup; the optimization is only correct when its structural hypothesis holds.

Footnotes

  1. CP-Algorithms, — DP optimizations: the convex hull trick, divide-and-conquer, and Knuth optimizations treated as transition-acceleration techniques over a fixed recurrence. 2
  2. Skiena, § — Dynamic Programming: recognizing that a DP's cost is the product of state count and per-state transition work, and attacking the transition.
Practice

╌╌ END ╌╌