Longest Increasing Subsequence
Given a sequence of numbers, how long is its longest strictly increasing subsequence? A first dynamic program indexes subproblems by the element each subsequence ends at, giving an solution with parent-pointer reconstruction.
╌╌╌╌
The previous lesson aligned two sequences. Now we ask a question about a single one. Given an array of numbers, a longest increasing subsequence (LIS) is a longest set of positions whose values strictly increase, . As with LCS, the word is subsequence, not substring: the chosen elements keep their original order but need not be contiguous. In the subsequence increases and has length , and no longer one exists, so the LIS length is .1
Plot the values against their positions and an LIS is a longest chain of points that climbs as it moves right, never stepping down:
A brute-force scan over all subsequences is hopeless. But LIS has clean optimal substructure, the same optimal-substructure lever behind every dynamic program, and it admits two algorithms worth knowing well: a direct dynamic program, and a slicker method built on patience sorting.
The dynamic program: subsequences that end here
The decisive modeling move, the analog of LCS's index by prefix,
is to index
each subproblem by the element its subsequence is forced to end at. Anchoring
the endpoint is what makes the pieces compose: an increasing subsequence ending at
is some shorter increasing subsequence ending at an earlier, smaller element,
with tacked on.
Every increasing subsequence ends somewhere, so the answer is , not , a small but important distinction from the prefix DPs, where the answer sat in the last cell.
To extend a subsequence so that it ends at , look at all earlier indices whose value is smaller than ; any increasing subsequence ending at such a can be lengthened by appending . We take the best such predecessor, or start fresh with just if none exists:
The guarantees the is defined and yields when no smaller predecessor exists: the subsequence consisting of alone.
Each scans the earlier indices, so the fill is time and space.
- 1for to do
- 2;singleton
- 3for to do
- 4if and then
- 5
- 6best predecessor
- 7
- 8return and the chain
Reconstruction. The array is a forest of parent pointers: to
recover an actual LIS, find the index maximizing , then follow
backwards until it hits nil, reversing the collected indices.
This costs , cheap beside the fill.
The blue endpoints form the recovered chain: starting at the best cell (index , value ) and hopping along visits indices , whose values reverse to the LIS .
The solid chain (indices ) realizes at index , and the row beneath records every . A subtlety worth naming: the predecessor of is not unique — both the (index ) and the (index ) are smaller and end a length- run, so either could precede it. Reconstruction is greedy: it stores and follows a single parent, and we take the earlier index, giving . The dotted arrows mark the other choices: extending the at index instead yields , and stopping at the (index , also ) yields . Whenever has ties or several endpoints reach the maximum length, the LIS is simply not unique.
The method: patience sorting and the tails array
The quadratic inner loop searches all earlier indices for the best predecessor. The trick that removes it is to stop tracking endpoints individually and instead maintain, for each achievable length, the single most useful witness.
The whole algorithm is one pass. For each element ,
binary-search for the first entry (a lower_bound). If one is found,
overwrite it with ; if none is (so exceeds every tail), append . The LIS
length is the final length of .
- 1empty array
- 2for to do
- 3first
- 4if then
- 5append toexceeds all tails: extend
- 6else
- 7shrink length- tail
- 8return
Each element costs one binary search, for total.
Why this is correct
Two facts carry the whole proof. First, is always sorted in increasing order, so binary search is even legal. Second, the updates never reduce any achievable length, so the array's length tracks the true LIS.
Sortedness. A length- increasing subsequence is a length- one with one
more element appended, so the minimum tail of length strictly exceeds that of
length : . An overwrite preserves this.
When we replace by , the lower_bound guarantees
(else would have been the found index) and
, so the array stays strictly
increasing.
Overwriting never hurts. Replacing with a smaller value can only help: any future element that could have extended a length- subsequence ending at the old, larger tail can still extend the one ending at , since is smaller. So no achievable length is ever lost: overwrites only make tails more permissive, never shorter. And appending when it beats every tail records a genuinely new, longer subsequence. Hence equals the LIS length.
Each row marks the overwrite-or-append position in acc. Adding overwrites the
(the first tail ): the length- run now ends at the smaller value ,
leaving room for future growth without ever shortening the array.
Reconstruction in . As written, holds values, not positions, so it loses the actual subsequence. To recover it, store indices: let hold the position whose value is , and on each step record (the index then sitting one pile to the left). Following back from reconstructs an LIS, exactly as in the quadratic version.
Variants on the same machine
Both algorithms bend to a family of related problems with small edits.
Longest non-decreasing subsequence. To allow equal adjacent values
(), change lower_bound to upper_bound: search
for the first tail strictly greater than . An equal element then extends rather
than overwrites, which is precisely the non-strict relaxation. (In the DP,
change the test to .)
Counting the number of LIS. Alongside track = the number of longest increasing subsequences ending at . When a strictly better predecessor is found (), reset ; when a tying predecessor is found (), accumulate . The answer sums over all achieving the global maximum . Speeding this counting variant to leans on a Fenwick or segment tree keyed by value to roll up best-length-and-count over smaller predecessors.2
Russian doll envelopes. Each envelope has width and height , and one
nests in another only if both dimensions are strictly larger; find the longest
nesting chain. Reduce to LIS in two dimensions: sort by width ascending, and
break ties by height descending, then run LIS on the height sequence alone. The
descending tiebreak is the trick: among envelopes of equal width, the descending
order makes it impossible for two of them to both appear in an increasing height
run (their heights decrease), so we never illegally nest
two equal-width
envelopes. With distinct widths this reduces a 2-D nesting to a plain 1-D LIS,
solvable in .3
Bitonic and longest decreasing. A longest decreasing subsequence is just LIS on the reversed comparison (or on the negated array). A longest bitonic subsequence, one that increases then decreases, is computed by running the ending-here LIS left-to-right to get and a symmetric decreasing pass right-to-left to get ; the best bitonic peak at has length .
On the peak sits at the value : three elements climb up to it () and three descend from it (), and the avoids double-counting the peak, so the whole array is one bitonic run of length .
Takeaways
- The LIS is a longest strictly-increasing subsequence (order preserved, contiguity not required), not a substring.
- The DP indexes by the ending index: , the answer is , and parent pointers reconstruct the subsequence.
- The method keeps a sorted tails array where
is the smallest tail of a length- run; each element
triggers a
lower_boundoverwrite-or-append. Overwriting with a smaller tail never loses an achievable length, so is the LIS length; this is patience sorting. - Variants reuse the machine:
upper_boundfor non-decreasing, parallel counts for the number of LIS, sort-then-LIS with a descending tiebreak for Russian-doll envelopes, and forward+backward passes for bitonic. - LIS equals the minimum antichain (decreasing-subsequence) cover, the pile count, a constructive instance of Mirsky's theorem.
Footnotes
- Skiena, § — Longest Increasing Subsequence: LIS as a canonical sequence DP, with the improvement over the naive quadratic fill. ↩
- Erickson, Ch. — Dynamic Programming: augmenting an optimization DP with a parallel count array to enumerate optimal solutions. ↩
- CLRS, Ch. 15 — Dynamic Programming (Problem 15-4): the longest-increasing-subsequence problem and its solution underpinning multi-dimensional nesting variants. ↩
╌╌ END ╌╌