Divide and Conquer & Mergesort
Divide and conquer breaks a problem into smaller copies of itself, solves them recursively, and stitches the answers together. We meet the paradigm through mergesort — its merge step, its loop-invariant proof, and the recursion tree that pins its cost at — then glimpse Karatsuba multiplication as a second example of the same idea.
╌╌╌╌
Some problems are easiest to solve by reducing them to smaller versions of themselves. This is the divide-and-conquer paradigm, and it is one of the most productive ideas in all of algorithm design. Every divide-and-conquer algorithm has the same three-part skeleton:
- Divide the problem into one or more subproblems that are smaller instances of the same problem.
- Conquer the subproblems by solving them recursively. When a subproblem is small enough (the base case), solve it directly without recursing.
- Combine the subproblem solutions into a solution for the original problem.
Erickson's advice captures the mindset: assume the recursion already
works, so that the recursive calls correctly solve the smaller instances, and
focus your energy on the divide and combine steps. This recursion fairy
1
stance turns a single hard problem into two manageable questions: how do I split? and
how do I merge?
The payoff is always a recurrence. If an instance of size spawns subproblems each of size , and the divide-plus-combine work costs , then the total cost obeys
Almost every algorithm in this module is an exercise in choosing , , and wisely and then reading off . The master theorem (stated at the end of this lesson) turns that reading-off into a mechanical three-case rule; the recursion tree is the picture behind it. Mergesort is the cleanest first example, so we start there.
The sorting problem, revisited
Recall the specification from the previous module:
Insertion sort grew a sorted prefix one element at a time, costing in the worst case. Divide and conquer does much better. The trick is to ask: if I already had two sorted halves, could I finish the job cheaply? The answer, yes, by merging, gives us mergesort.2
Mergesort
To sort the subarray , split it at the midpoint , recursively sort the two halves, and merge them back together. A single element () is already sorted, so it is the base case.
- 1if then
- 2split point
- 3callsort left half
- 4callsort right half
- 5callcombine halves
All the real work lives in the combine step. takes two adjacent sorted runs, and , and interleaves them into a single sorted run in place. It copies each half into a scratch array, then repeatedly takes the smaller of the two front elements and writes it back.
- 1
- 2
- 3let and be new arrays
- 4for to do
- 5copy left half
- 6for to do
- 7copy right half
- 8sentinel guards the run end
- 9
- 10
- 11
- 12for to do
- 13if then
- 14
- 15
- 16else
- 17
- 18
The two sentinel values are a small but useful device: once one
half is used up, its front element is forever , so the comparison
always picks from the other half. This removes the need to test have we run out?
on every iteration.
Picture the merge in flight. Two sorted runs and sit above the output; the cursors and point at their smallest uncopied elements, and marks where the next winner lands in . Each step compares to , writes the smaller, and advances that one cursor.
Here and , so wins: it is written to and advances. The two sentinels guard the right ends so the comparison is always well-defined.
Why merge is correct
Merge runs in time on elements: each of the iterations of the final for loop does work and advances exactly one of , . Correctness rests on a loop invariant:
- Initialization. Before the first iteration , so is empty: it holds the smallest elements, vacuously sorted. Since nothing has been copied, and are indeed the smallest uncopied elements.
- Maintenance. Suppose (the other case is symmetric). Then is the smallest uncopied element. Appending it to the sorted keeps sorted and now containing the smallest elements. Incrementing and restores the invariant.
- Termination. The loop ends with , so holds all elements in sorted order. The sentinels guarantee we never read past the real data.
Run the loop to completion on the two halves and and every element lands in its sorted slot below:
Analyzing the cost
Let be the worst-case running time of mergesort on elements. Splitting costs , the two recursive calls cost , and the merge costs . So
To see why this resolves to , draw the recursion tree. Each node is labeled with the non-recursive work it does, the cost of its own merge. The root merges elements; its two children each merge ; the next level has four nodes each merging ; and so on.
The key observation: each level sums to the same amount. The root level is ; the next is ; the next is ; in general level has nodes each doing work, for a row total of .
Halving from down to the base case of takes steps, so there are levels. Multiplying the per-level cost by the number of levels:
This is the canonical application of the master theorem (, , , so and we land in the balanced case), but the recursion tree makes the concrete: levels, work apiece.
Stability
Mergesort is stable: equal elements keep their original relative order. This falls out of the in : when we take from , the left (earlier) half, first. Stability matters when records are sorted on one key but carry others: a stable sort lets you sort by secondary key, then primary key, and trust that ties on the primary preserve the secondary ordering.
Mergesort versus other sorts
| Property | Mergesort | Insertion sort | Heapsort | Quicksort |
|---|---|---|---|---|
| Worst case | ||||
| Average case | ||||
| Extra space | ||||
| Stable | yes | yes | no | no |
| In place | no | yes | yes | yes |
Mergesort's worst-case guarantee and stability make it the sort of choice when predictability matters or when data does not fit in memory. Its sequential, merge-based access pattern is ideal for sorting linked lists and for external sorting of data streamed from disk.3 Its cost is the auxiliary array. Quicksort, the subject of the next lesson, trades that guarantee for better constants and in-place operation.
Counting inversions
Here is a problem that has nothing to do with sorting on its surface, yet falls to the very machinery we just built. Given a list , how close to sorted is it? A natural measure counts the pairs that are out of order.
A sorted array has zero inversions; a reverse-sorted one has the maximum,
. (Inversion counts also drive collaborative-filtering how similar are two rankings?
scores.) The brute-force algorithm loops over all
pairs and counts the bad ones, costing exactly comparisons. We can do far better.
Idea 0: divide and conquer, just like mergesort. Split into a left half and a right half . Every inversion is one of three kinds:
- both endpoints in , counted by recursing on ;
- both endpoints in , counted by recursing on ;
- one endpoint in each: a cross inversion, on the left and on the right with .
Counting cross inversions with a double loop costs for the combine step, giving , which the master theorem resolves to , no gain. The combine step is the bottleneck.
Idea 1: count cross inversions during a merge. Suppose the two halves arrive already sorted. Walk them with two cursors exactly as does. When we are about to emit and , the element is smaller than and than everything after it in , so forms an inversion with all remaining elements of at once. Add that count, emit , and move on.
This batching is exactly why the count collapses to linear time: a single comparison reveals inversions, not one. Because is sorted, every element from onward exceeds , so each is inverted with it.
- 1
- 2
- 3
- 4while and do
- 5if then
- 6no inversion
- 7else
- 8inverts with
- 9
- 10return
This runs in , the linear merge pattern. But it demands sorted halves, so we must sort them first: sorting and costs an extra per level, and there are levels, giving . Better than quadratic, but the repeated sorting is wasteful.
Idea 2: sort and count in one pass. We are doing almost all of mergesort's work anyway, so let the recursion return both the inversion count and a sorted copy of its slice. Then the cross-counting merge also produces the sorted output the parent needs, for free.
- 1if then
- 2returnsingle element: no inversions
- 3
- 4left inversions + sort left
- 5right inversions + sort right
- 6cross + merge
- 7return
The helper is just with the counting rule from folded in: whenever it takes from the right half because , it adds the number of elements still waiting in the left half. The combine step is now plain linear, so
This is the same recurrence as mergesort, and the same recursion tree explains it: levels, work each. Counting how disordered a list is costs no more, asymptotically, than sorting it.
Karatsuba multiplication: the same idea elsewhere
Sorting is not the only home for divide and conquer. Take a problem that feels unrelated: multiplying two large integers. Adding two -bit numbers is easy. The grade-school ripple-carry method is , and you cannot beat linear since you must at least read the input. Multiplication is the interesting one. The grade-school algorithm forms shifted partial products and adds them, costing . Can divide and conquer do better?
Write each -bit integer as a high half and a low half. With , split into its least-significant bits and its top bits , and likewise split into and :
Multiplying out and gives
The powers of two are just bit-shifts (free, up to the additions), so the cost is dominated by the four half-size products , , , . That yields . Unrolling shows this is no improvement at all:
At the leaf term is , which dominates everything: . Splitting bought us nothing, because subproblems of half size is exactly the quadratic regime.
Karatsuba's trick is to compute the middle coefficient without a third and fourth multiplication. Notice the algebraic identity
We were going to compute and anyway. So once we have those two products, the middle term needs only one more multiplication, , plus a few additions. Three half-size products suffice:
- 1if then
- 2returnbase case
- 3
- 4split andlow and high halves
- 5recursive call 1
- 6recursive call 2
- 7recursive call 3 — the trick
- 8recovers
- 9returnshift and add
Now recursive calls, each on half-size inputs, with work to split, shift, and add:
The recursion tree makes the cost legible. Level has nodes, each doing work, for a row total of . The rows grow geometrically downward, so the leaves dominate:
using (the identity ). Trading one multiplication for a handful of additions drops integer multiplication below quadratic. The lesson generalizes: the number of subproblems, not just their size, drives the asymptotics, which is exactly what the master theorem makes precise. (Pushing the same idea to more than two chunks leads to the Fast Fourier Transform and multiplication.)
Strassen's matrix multiplication
The same spend additions to save a multiplication
idea cracks the cubic barrier
for matrix multiplication. The schoolbook method costs : each of
the entries of is a length- dot product. Divide and conquer
splits each matrix into four blocks, and the
product is read off block by block exactly as for scalars:
Taken at face value, the four output blocks need eight block multiplications, giving . The master theorem (next section) returns : more subproblems exactly cancel their smaller size, so blocking alone buys nothing.
Strassen's 1969 insight is that seven products suffice.4 Form seven recursive -size multiplications,
and recover the output blocks by addition only:
The eighth multiplication is gone, traded for a handful of extra block additions that cost only . The recurrence becomes , and now the leaves win:
The constant is large, so the crossover with the cubic method only pays off for sizable ; in practice Strassen is switched in above a threshold and the base case falls back to the cache-friendly schoolbook multiply. Theoretically, though, it was the first crack in the wall, and a long line of refinements has pushed the exponent down toward .
The master theorem
Every recurrence in this lesson has the form . The recursion-tree analysis we did by hand each time generalizes to a single rule. Compare the branching exponent , the rate at which leaves proliferate, against the work exponent :
The three cases are exactly the three shapes of recursion tree: when the rows grow toward the leaves (Karatsuba), when they are equal every row costs the same (mergesort), and when the root's work dominates. Reading off our examples:
| Algorithm | Recurrence | vs | ||||
|---|---|---|---|---|---|---|
| Mergesort | , balanced | |||||
| Counting inversions | , balanced | |||||
| Inversions, naive combine | , root-heavy | |||||
| Karatsuba | , leaf-heavy |
One last sanity check worth stressing: it makes no difference whether the combine cost is written or bounded above by . The recurrences and have the same solution. The master theorem cares only about , , and the exponent .
Takeaways
- Divide and conquer = divide into smaller copies, conquer recursively, combine. Trust the recursion; focus on the split and the merge. The cost is always a recurrence .
- divides at the midpoint and combines with a linear-time whose correctness is a clean loop-invariant argument.
- The recurrence unfolds into a recursion tree with levels of work each, giving .
- Mergesort is stable and worst-case optimal among comparison sorts, at the cost of extra space, ideal for linked lists and external sorting.
- Counting inversions reuses the merge: fold a cross-inversion count into so each step adds , sorting and counting together in instead of the brute-force .
- Karatsuba multiplication shows the paradigm's reach: the identity cuts four subproblems to three, taking integer multiplication from to .
- The master theorem turns the tree into a rule: compare to for leaf-heavy, balanced, or root-heavy behavior.5
Footnotes
- Erickson, Algorithms, Ch. 1 — Recursion: the
recursion fairy
stance of assuming recursive calls already work and focusing on divide and combine. ↩ - CLRS, Ch. 2 (§2.3) — Designing algorithms: mergesort as the canonical divide-and-conquer sort built on a linear-time merge. ↩
- Skiena, The Algorithm Design Manual, §4 — Sorting and Searching: mergesort's stability and suitability for linked lists and external sorting. ↩
- CLRS, Ch. 4 (§4.2) — Strassen's algorithm for matrix multiplication and its recurrence; the original result is V. Strassen (1969),
Gaussian elimination is not optimal,
Numerische Mathematik 13, 354–356. ↩ - CLRS, Ch. 4 — Divide-and-Conquer: the master theorem comparing against the work exponent to classify leaf-heavy, balanced, and root-heavy recurrences. ↩
╌╌ END ╌╌