Divide & Conquer/Divide and Conquer & Mergesort

Lesson 2.12,628 words

Divide and Conquer & Mergesort

Divide and conquer breaks a problem into smaller copies of itself, solves them recursively, and stitches the answers together. We meet the paradigm through mergesort — its merge step, its loop-invariant proof, and the recursion tree that pins its cost at Θ(nlogn)\Theta(n\log n) — then glimpse Karatsuba multiplication as a second example of the same idea.

╌╌╌╌

Some problems are easiest to solve by reducing them to smaller versions of themselves. This is the divide-and-conquer paradigm, and it is one of the most productive ideas in all of algorithm design. Every divide-and-conquer algorithm has the same three-part skeleton:

  • Divide the problem into one or more subproblems that are smaller instances of the same problem.
  • Conquer the subproblems by solving them recursively. When a subproblem is small enough (the base case), solve it directly without recursing.
  • Combine the subproblem solutions into a solution for the original problem.

Erickson's advice captures the mindset: assume the recursion already works, so that the recursive calls correctly solve the smaller instances, and focus your energy on the divide and combine steps. This recursion fairy1 stance turns a single hard problem into two manageable questions: how do I split? and how do I merge?

The payoff is always a recurrence. If an instance of size spawns subproblems each of size , and the divide-plus-combine work costs , then the total cost obeys

Almost every algorithm in this module is an exercise in choosing , , and wisely and then reading off . The master theorem (stated at the end of this lesson) turns that reading-off into a mechanical three-case rule; the recursion tree is the picture behind it. Mergesort is the cleanest first example, so we start there.

The sorting problem, revisited

Recall the specification from the previous module:

Insertion sort grew a sorted prefix one element at a time, costing in the worst case. Divide and conquer does much better. The trick is to ask: if I already had two sorted halves, could I finish the job cheaply? The answer, yes, by merging, gives us mergesort.2

Mergesort

To sort the subarray , split it at the midpoint , recursively sort the two halves, and merge them back together. A single element () is already sorted, so it is the base case.

Mergesort on . The top half divides (black arrows) down to singletons — the base cases; the bottom half merges (blue arrows) those sorted runs back up, pair by pair, to the final sorted list.
Algorithm 1:Merge-Sort(A,p,r)\textsc{Merge-Sort}(A, p, r) — sort A[p..r]A[p..r] in increasing order
  1. 1
    if p<rp < r then
  2. 2
    q(p+r)/2q \gets \floor{(p + r) / 2}
    split point
  3. 3
    call Merge-Sort(A,p,q)\textsc{Merge-Sort}(A, p, q)
    sort left half
  4. 4
    call Merge-Sort(A,q+1,r)\textsc{Merge-Sort}(A, q + 1, r)
    sort right half
  5. 5
    call Merge(A,p,q,r)\textsc{Merge}(A, p, q, r)
    combine halves

All the real work lives in the combine step. takes two adjacent sorted runs, and , and interleaves them into a single sorted run in place. It copies each half into a scratch array, then repeatedly takes the smaller of the two front elements and writes it back.

Algorithm 2:Merge(A,p,q,r)\textsc{Merge}(A, p, q, r) — merge sorted A[p..q]A[p..q] and A[q+1..r]A[q+1..r]
  1. 1
    n1qp+1n_1 \gets q - p + 1
  2. 2
    n2rqn_2 \gets r - q
  3. 3
    let L[1..n1+1]L[1..n_1 + 1] and R[1..n2+1]R[1..n_2 + 1] be new arrays
  4. 4
    for i1i \gets 1 to n1n_1 do
  5. 5
    L[i]A[p+i1]L[i] \gets A[p + i - 1]
    copy left half
  6. 6
    for j1j \gets 1 to n2n_2 do
  7. 7
    R[j]A[q+j]R[j] \gets A[q + j]
    copy right half
  8. 8
    L[n1+1]L[n_1 + 1] \gets \infty
    sentinel guards the run end
  9. 9
    R[n2+1]R[n_2 + 1] \gets \infty
  10. 10
    i1i \gets 1
  11. 11
    j1j \gets 1
  12. 12
    for kpk \gets p to rr do
  13. 13
    if L[i]R[j]L[i] \le R[j] then
  14. 14
    A[k]L[i]A[k] \gets L[i]
  15. 15
    ii+1i \gets i + 1
  16. 16
    else
  17. 17
    A[k]R[j]A[k] \gets R[j]
  18. 18
    jj+1j \gets j + 1

The two sentinel values are a small but useful device: once one half is used up, its front element is forever , so the comparison always picks from the other half. This removes the need to test have we run out? on every iteration.

Picture the merge in flight. Two sorted runs and sit above the output; the cursors and point at their smallest uncopied elements, and marks where the next winner lands in . Each step compares to , writes the smaller, and advances that one cursor.

Merge step with cursors and on sorted runs and writing the smaller value into at .

Here and , so wins: it is written to and advances. The two sentinels guard the right ends so the comparison is always well-defined.

Why merge is correct

Merge runs in time on elements: each of the iterations of the final for loop does work and advances exactly one of , . Correctness rests on a loop invariant:

  • Initialization. Before the first iteration , so is empty: it holds the smallest elements, vacuously sorted. Since nothing has been copied, and are indeed the smallest uncopied elements.
  • Maintenance. Suppose (the other case is symmetric). Then is the smallest uncopied element. Appending it to the sorted keeps sorted and now containing the smallest elements. Incrementing and restores the invariant.
  • Termination. The loop ends with , so holds all elements in sorted order. The sentinels guarantee we never read past the real data.

Run the loop to completion on the two halves and and every element lands in its sorted slot below:

The completed merge of sorted halves and interleaves into one sorted run.

Analyzing the cost

Let be the worst-case running time of mergesort on elements. Splitting costs , the two recursive calls cost , and the merge costs . So

To see why this resolves to , draw the recursion tree. Each node is labeled with the non-recursive work it does, the cost of its own merge. The root merges elements; its two children each merge ; the next level has four nodes each merging ; and so on.

Recursion tree for mergesort with each node showing its merge cost, summing to per level.

The key observation: each level sums to the same amount. The root level is ; the next is ; the next is ; in general level has nodes each doing work, for a row total of .

Doubling the node count while halving each node's work keeps every level's total fixed at ; the rows give .

Halving from down to the base case of takes steps, so there are levels. Multiplying the per-level cost by the number of levels:

This is the canonical application of the master theorem (, , , so and we land in the balanced case), but the recursion tree makes the concrete: levels, work apiece.

Stability

Mergesort is stable: equal elements keep their original relative order. This falls out of the in : when we take from , the left (earlier) half, first. Stability matters when records are sorted on one key but carry others: a stable sort lets you sort by secondary key, then primary key, and trust that ties on the primary preserve the secondary ordering.

Mergesort versus other sorts

PropertyMergesortInsertion sortHeapsortQuicksort
Worst case
Average case
Extra space
Stableyesyesnono
In placenoyesyesyes

Mergesort's worst-case guarantee and stability make it the sort of choice when predictability matters or when data does not fit in memory. Its sequential, merge-based access pattern is ideal for sorting linked lists and for external sorting of data streamed from disk.3 Its cost is the auxiliary array. Quicksort, the subject of the next lesson, trades that guarantee for better constants and in-place operation.

Counting inversions

Here is a problem that has nothing to do with sorting on its surface, yet falls to the very machinery we just built. Given a list , how close to sorted is it? A natural measure counts the pairs that are out of order.

A sorted array has zero inversions; a reverse-sorted one has the maximum, . (Inversion counts also drive collaborative-filtering how similar are two rankings? scores.) The brute-force algorithm loops over all pairs and counts the bad ones, costing exactly comparisons. We can do far better.

Idea 0: divide and conquer, just like mergesort. Split into a left half and a right half . Every inversion is one of three kinds:

  • both endpoints in , counted by recursing on ;
  • both endpoints in , counted by recursing on ;
  • one endpoint in each: a cross inversion, on the left and on the right with .
A cross inversion linking an element in left half to a smaller element in right half .

Counting cross inversions with a double loop costs for the combine step, giving , which the master theorem resolves to , no gain. The combine step is the bottleneck.

Idea 1: count cross inversions during a merge. Suppose the two halves arrive already sorted. Walk them with two cursors exactly as does. When we are about to emit and , the element is smaller than and than everything after it in , so forms an inversion with all remaining elements of at once. Add that count, emit , and move on.

This batching is exactly why the count collapses to linear time: a single comparison reveals inversions, not one. Because is sorted, every element from onward exceeds , so each is inverted with it.

When the merge finds , sortedness of means every remaining also exceeds — so emitting adds cross inversions in one stroke.
Algorithm 3:Count-Cross-Inv(B[1..p],C[1..q])\textsc{Count-Cross-Inv}(B[1..p], C[1..q]) — cross inversions, B,CB, C sorted
  1. 1
    ans0\mathit{ans} \gets 0
  2. 2
    i1i \gets 1
  3. 3
    j1j \gets 1
  4. 4
    while ipi \le p and jqj \le q do
  5. 5
    if B[i]C[j]B[i] \le C[j] then
  6. 6
    ii+1i \gets i + 1
    no inversion
  7. 7
    else
  8. 8
    ansans+(pi+1)\mathit{ans} \gets \mathit{ans} + (p - i + 1)
    C[j]C[j] inverts with B[i..p]B[i..p]
  9. 9
    jj+1j \gets j + 1
  10. 10
    return ans\mathit{ans}

This runs in , the linear merge pattern. But it demands sorted halves, so we must sort them first: sorting and costs an extra per level, and there are levels, giving . Better than quadratic, but the repeated sorting is wasteful.

Idea 2: sort and count in one pass. We are doing almost all of mergesort's work anyway, so let the recursion return both the inversion count and a sorted copy of its slice. Then the cross-counting merge also produces the sorted output the parent needs, for free.

Algorithm 4:Sort-And-Count-Inv(A,lo,hi)\textsc{Sort-And-Count-Inv}(A, \mathit{lo}, \mathit{hi}) — sort A[lo..hi]A[\mathit{lo}..\mathit{hi}], return its inversion count
  1. 1
    if hilo\mathit{hi} \le \mathit{lo} then
  2. 2
    return 00
    single element: no inversions
  3. 3
    t(lo+hi)/2t \gets \floor{(\mathit{lo} + \mathit{hi}) / 2}
  4. 4
    cSort-And-Count-Inv(A,lo,t)c \gets \textsc{Sort-And-Count-Inv}(A, \mathit{lo}, t)
    left inversions + sort left
  5. 5
    cc+Sort-And-Count-Inv(A,t+1,hi)c \gets c + \textsc{Sort-And-Count-Inv}(A, t + 1, \mathit{hi})
    right inversions + sort right
  6. 6
    cc+Count-Cross-Inv-And-Merge(A,lo,t,hi)c \gets c + \textsc{Count-Cross-Inv-And-Merge}(A, \mathit{lo}, t, \mathit{hi})
    cross + merge
  7. 7
    return cc

The helper is just with the counting rule from folded in: whenever it takes from the right half because , it adds the number of elements still waiting in the left half. The combine step is now plain linear, so

This is the same recurrence as mergesort, and the same recursion tree explains it: levels, work each. Counting how disordered a list is costs no more, asymptotically, than sorting it.

Karatsuba multiplication: the same idea elsewhere

Sorting is not the only home for divide and conquer. Take a problem that feels unrelated: multiplying two large integers. Adding two -bit numbers is easy. The grade-school ripple-carry method is , and you cannot beat linear since you must at least read the input. Multiplication is the interesting one. The grade-school algorithm forms shifted partial products and adds them, costing . Can divide and conquer do better?

Write each -bit integer as a high half and a low half. With , split into its least-significant bits and its top bits , and likewise split into and :

Splitting each -bit integer and into high and low -bit halves for Karatsuba.

Multiplying out and gives

The powers of two are just bit-shifts (free, up to the additions), so the cost is dominated by the four half-size products , , , . That yields . Unrolling shows this is no improvement at all:

At the leaf term is , which dominates everything: . Splitting bought us nothing, because subproblems of half size is exactly the quadratic regime.

Karatsuba's trick is to compute the middle coefficient without a third and fourth multiplication. Notice the algebraic identity

We were going to compute and anyway. So once we have those two products, the middle term needs only one more multiplication, , plus a few additions. Three half-size products suffice:

Algorithm 5:Karatsuba(x,y)\textsc{Karatsuba}(x, y) — multiply two nn-bit integers
  1. 1
    if n=1n = 1 then
  2. 2
    return xyx \cdot y
    base case
  3. 3
    tn/2t \gets \floor{n / 2}
  4. 4
    split x=a+2tbx = a + 2^{t} b and y=c+2tdy = c + 2^{t} d
    low and high halves
  5. 5
    acKaratsuba(a,c)\mathit{ac} \gets \textsc{Karatsuba}(a, c)
    recursive call 1
  6. 6
    bdKaratsuba(b,d)\mathit{bd} \gets \textsc{Karatsuba}(b, d)
    recursive call 2
  7. 7
    mKaratsuba(ab,dc)m \gets \textsc{Karatsuba}(a - b, d - c)
    recursive call 3 — the trick
  8. 8
    midm+ac+bd\mathit{mid} \gets m + \mathit{ac} + \mathit{bd}
    recovers ad+bcad + bc
  9. 9
    return ac+2tmid+22tbd\mathit{ac} + 2^{t}\,\mathit{mid} + 2^{2t}\,\mathit{bd}
    shift and add

Now recursive calls, each on half-size inputs, with work to split, shift, and add:

The recursion tree makes the cost legible. Level has nodes, each doing work, for a row total of . The rows grow geometrically downward, so the leaves dominate:

Each Karatsuba node spawns half-size children, so the rows grow by downward and the leaf level dominates.

using (the identity ). Trading one multiplication for a handful of additions drops integer multiplication below quadratic. The lesson generalizes: the number of subproblems, not just their size, drives the asymptotics, which is exactly what the master theorem makes precise. (Pushing the same idea to more than two chunks leads to the Fast Fourier Transform and multiplication.)

Strassen's matrix multiplication

The same spend additions to save a multiplication idea cracks the cubic barrier for matrix multiplication. The schoolbook method costs : each of the entries of is a length- dot product. Divide and conquer splits each matrix into four blocks, and the product is read off block by block exactly as for scalars:

Taken at face value, the four output blocks need eight block multiplications, giving . The master theorem (next section) returns : more subproblems exactly cancel their smaller size, so blocking alone buys nothing.

Strassen's 1969 insight is that seven products suffice.4 Form seven recursive -size multiplications,

and recover the output blocks by addition only:

The eighth multiplication is gone, traded for a handful of extra block additions that cost only . The recurrence becomes , and now the leaves win:

Each Strassen node makes recursive half-size multiplications instead of the naive — the fan-out, not the block size, is what drops the cost from to .

The constant is large, so the crossover with the cubic method only pays off for sizable ; in practice Strassen is switched in above a threshold and the base case falls back to the cache-friendly schoolbook multiply. Theoretically, though, it was the first crack in the wall, and a long line of refinements has pushed the exponent down toward .

The master theorem

Every recurrence in this lesson has the form . The recursion-tree analysis we did by hand each time generalizes to a single rule. Compare the branching exponent , the rate at which leaves proliferate, against the work exponent :

The three cases are exactly the three shapes of recursion tree: when the rows grow toward the leaves (Karatsuba), when they are equal every row costs the same (mergesort), and when the root's work dominates. Reading off our examples:

AlgorithmRecurrence vs
Mergesort, balanced
Counting inversions, balanced
Inversions, naive combine, root-heavy
Karatsuba, leaf-heavy

One last sanity check worth stressing: it makes no difference whether the combine cost is written or bounded above by . The recurrences and have the same solution. The master theorem cares only about , , and the exponent .

Takeaways

  • Divide and conquer = divide into smaller copies, conquer recursively, combine. Trust the recursion; focus on the split and the merge. The cost is always a recurrence .
  • divides at the midpoint and combines with a linear-time whose correctness is a clean loop-invariant argument.
  • The recurrence unfolds into a recursion tree with levels of work each, giving .
  • Mergesort is stable and worst-case optimal among comparison sorts, at the cost of extra space, ideal for linked lists and external sorting.
  • Counting inversions reuses the merge: fold a cross-inversion count into so each step adds , sorting and counting together in instead of the brute-force .
  • Karatsuba multiplication shows the paradigm's reach: the identity cuts four subproblems to three, taking integer multiplication from to .
  • The master theorem turns the tree into a rule: compare to for leaf-heavy, balanced, or root-heavy behavior.5

Footnotes

  1. Erickson, Algorithms, Ch. 1 — Recursion: the recursion fairy stance of assuming recursive calls already work and focusing on divide and combine.
  2. CLRS, Ch. 2 (§2.3) — Designing algorithms: mergesort as the canonical divide-and-conquer sort built on a linear-time merge.
  3. Skiena, The Algorithm Design Manual, §4 — Sorting and Searching: mergesort's stability and suitability for linked lists and external sorting.
  4. CLRS, Ch. 4 (§4.2) — Strassen's algorithm for matrix multiplication and its recurrence; the original result is V. Strassen (1969), Gaussian elimination is not optimal, Numerische Mathematik 13, 354–356.
  5. CLRS, Ch. 4 — Divide-and-Conquer: the master theorem comparing against the work exponent to classify leaf-heavy, balanced, and root-heavy recurrences.
Practice

╌╌ END ╌╌