Dynamic Programming/Longest Increasing Subsequence

Lesson 8.31,678 words

Longest Increasing Subsequence

Given a sequence of numbers, how long is its longest strictly increasing subsequence? A first dynamic program indexes subproblems by the element each subsequence ends at, giving an O(n2)O(n^2) solution with parent-pointer reconstruction.

╌╌╌╌

The previous lesson aligned two sequences. Now we ask a question about a single one. Given an array of numbers, a longest increasing subsequence (LIS) is a longest set of positions whose values strictly increase, . As with LCS, the word is subsequence, not substring: the chosen elements keep their original order but need not be contiguous. In the subsequence increases and has length , and no longer one exists, so the LIS length is .1

Plot the values against their positions and an LIS is a longest chain of points that climbs as it moves right, never stepping down:

Values of plotted against position. An LIS is a longest left-to-right chain that strictly rises; the highlighted path (length ) is one such chain.

A brute-force scan over all subsequences is hopeless. But LIS has clean optimal substructure, the same optimal-substructure lever behind every dynamic program, and it admits two algorithms worth knowing well: a direct dynamic program, and a slicker method built on patience sorting.

The dynamic program: subsequences that end here

The decisive modeling move, the analog of LCS's index by prefix, is to index each subproblem by the element its subsequence is forced to end at. Anchoring the endpoint is what makes the pieces compose: an increasing subsequence ending at is some shorter increasing subsequence ending at an earlier, smaller element, with tacked on.

Every increasing subsequence ends somewhere, so the answer is , not , a small but important distinction from the prefix DPs, where the answer sat in the last cell.

To extend a subsequence so that it ends at , look at all earlier indices whose value is smaller than ; any increasing subsequence ending at such a can be lengthened by appending . We take the best such predecessor, or start fresh with just if none exists:

The guarantees the is defined and yields when no smaller predecessor exists: the subsequence consisting of alone.

Each scans the earlier indices, so the fill is time and space.

Algorithm 1:LIS-Quadratic(a[1..n])\textsc{LIS-Quadratic}(a[1..n]) — length and parent pointers
  1. 1
    for i1i \gets 1 to nn do
  2. 2
    L[i]1L[i] \gets 1 ;  prev[i]nil\ \mathit{prev}[i] \gets \text{nil}
    singleton
  3. 3
    for j1j \gets 1 to i1i - 1 do
  4. 4
    if a[j]<a[i]a[j] < a[i] and L[j]+1>L[i]L[j] + 1 > L[i] then
  5. 5
    L[i]L[j]+1L[i] \gets L[j] + 1
  6. 6
    prev[i]j\mathit{prev}[i] \gets j
    best predecessor
  7. 7
    bestargmaxiL[i]\mathit{best} \gets \arg\max_i L[i]
  8. 8
    return L[best]L[\mathit{best}] and the chain best,prev[best],\mathit{best}, \mathit{prev}[\mathit{best}], \dots

Reconstruction. The array is a forest of parent pointers: to recover an actual LIS, find the index maximizing , then follow backwards until it hits nil, reversing the collected indices. This costs , cheap beside the fill.

Parent pointers for . Following back from the best endpoint (index , ) yields .

The blue endpoints form the recovered chain: starting at the best cell (index , value ) and hopping along visits indices , whose values reverse to the LIS .

One chain (indices ), arrows along the cell tops. The dotted arrows show equally-long alternatives — starting at the (index ), or ending at the (index ) instead of the — so the LIS is not unique.

The solid chain (indices ) realizes at index , and the row beneath records every . A subtlety worth naming: the predecessor of is not unique — both the (index ) and the (index ) are smaller and end a length- run, so either could precede it. Reconstruction is greedy: it stores and follows a single parent, and we take the earlier index, giving . The dotted arrows mark the other choices: extending the at index instead yields , and stopping at the (index , also ) yields . Whenever has ties or several endpoints reach the maximum length, the LIS is simply not unique.

The method: patience sorting and the tails array

The quadratic inner loop searches all earlier indices for the best predecessor. The trick that removes it is to stop tracking endpoints individually and instead maintain, for each achievable length, the single most useful witness.

The whole algorithm is one pass. For each element , binary-search for the first entry (a lower_bound). If one is found, overwrite it with ; if none is (so exceeds every tail), append . The LIS length is the final length of .

Algorithm 2:LIS-Patience(a[1..n])\textsc{LIS-Patience}(a[1..n]) — tails array, O(nlogn)O(n\log n)
  1. 1
    tails\mathit{tails} \gets empty array
  2. 2
    for i1i \gets 1 to nn do
  3. 3
    kLower-Bound(tails, a[i])k \gets \textsc{Lower-Bound}(\mathit{tails},\ a[i])
    first tails[k]a[i]\mathit{tails}[k] \ge a[i]
  4. 4
    if k=tailsk = |\mathit{tails}| then
  5. 5
    append a[i]a[i] to tails\mathit{tails}
    exceeds all tails: extend
  6. 6
    else
  7. 7
    tails[k]a[i]\mathit{tails}[k] \gets a[i]
    shrink length-(k+1)(k{+}1) tail
  8. 8
    return tails|\mathit{tails}|

Each element costs one binary search, for total.

Why this is correct

Two facts carry the whole proof. First, is always sorted in increasing order, so binary search is even legal. Second, the updates never reduce any achievable length, so the array's length tracks the true LIS.

Sortedness. A length- increasing subsequence is a length- one with one more element appended, so the minimum tail of length strictly exceeds that of length : . An overwrite preserves this. When we replace by , the lower_bound guarantees (else would have been the found index) and , so the array stays strictly increasing.

Overwriting never hurts. Replacing with a smaller value can only help: any future element that could have extended a length- subsequence ending at the old, larger tail can still extend the one ending at , since is smaller. So no achievable length is ever lost: overwrites only make tails more permissive, never shorter. And appending when it beats every tail records a genuinely new, longer subsequence. Hence equals the LIS length.

Patience sorting / : the tails array as elements arrive.

Each row marks the overwrite-or-append position in acc. Adding overwrites the (the first tail ): the length- run now ends at the smaller value , leaving room for future growth without ever shortening the array.

Reconstruction in . As written, holds values, not positions, so it loses the actual subsequence. To recover it, store indices: let hold the position whose value is , and on each step record (the index then sitting one pile to the left). Following back from reconstructs an LIS, exactly as in the quadratic version.

Variants on the same machine

Both algorithms bend to a family of related problems with small edits.

Longest non-decreasing subsequence. To allow equal adjacent values (), change lower_bound to upper_bound: search for the first tail strictly greater than . An equal element then extends rather than overwrites, which is precisely the non-strict relaxation. (In the DP, change the test to .)

Counting the number of LIS. Alongside track = the number of longest increasing subsequences ending at . When a strictly better predecessor is found (), reset ; when a tying predecessor is found (), accumulate . The answer sums over all achieving the global maximum . Speeding this counting variant to leans on a Fenwick or segment tree keyed by value to roll up best-length-and-count over smaller predecessors.2

Russian doll envelopes. Each envelope has width and height , and one nests in another only if both dimensions are strictly larger; find the longest nesting chain. Reduce to LIS in two dimensions: sort by width ascending, and break ties by height descending, then run LIS on the height sequence alone. The descending tiebreak is the trick: among envelopes of equal width, the descending order makes it impossible for two of them to both appear in an increasing height run (their heights decrease), so we never illegally nest two equal-width envelopes. With distinct widths this reduces a 2-D nesting to a plain 1-D LIS, solvable in .3

Bitonic and longest decreasing. A longest decreasing subsequence is just LIS on the reversed comparison (or on the negated array). A longest bitonic subsequence, one that increases then decreases, is computed by running the ending-here LIS left-to-right to get and a symmetric decreasing pass right-to-left to get ; the best bitonic peak at has length .

Bitonic length at each peak. A forward pass gives (longest increasing ending at ), a backward pass gives (longest decreasing starting at ); the peak at scores .

On the peak sits at the value : three elements climb up to it () and three descend from it (), and the avoids double-counting the peak, so the whole array is one bitonic run of length .

Takeaways

  • The LIS is a longest strictly-increasing subsequence (order preserved, contiguity not required), not a substring.
  • The DP indexes by the ending index: , the answer is , and parent pointers reconstruct the subsequence.
  • The method keeps a sorted tails array where is the smallest tail of a length- run; each element triggers a lower_bound overwrite-or-append. Overwriting with a smaller tail never loses an achievable length, so is the LIS length; this is patience sorting.
  • Variants reuse the machine: upper_bound for non-decreasing, parallel counts for the number of LIS, sort-then-LIS with a descending tiebreak for Russian-doll envelopes, and forward+backward passes for bitonic.
  • LIS equals the minimum antichain (decreasing-subsequence) cover, the pile count, a constructive instance of Mirsky's theorem.

Footnotes

  1. Skiena, § — Longest Increasing Subsequence: LIS as a canonical sequence DP, with the improvement over the naive quadratic fill.
  2. Erickson, Ch. — Dynamic Programming: augmenting an optimization DP with a parallel count array to enumerate optimal solutions.
  3. CLRS, Ch. 15 — Dynamic Programming (Problem 15-4): the longest-increasing-subsequence problem and its solution underpinning multi-dimensional nesting variants.
Practice

╌╌ END ╌╌