Divide and Conquer & Mergesort

Some problems are easiest to solve by reducing them to smaller versions of themselves. This is the divide-and-conquer paradigm, and it is one of the most productive ideas in all of algorithm design. Every divide-and-conquer algorithm has the same three-part skeleton:

Divide the problem into one or more subproblems that are smaller instances of the same problem.
Conquer the subproblems by solving them recursively. When a subproblem is small enough (the base case), solve it directly without recursing.
Combine the subproblem solutions into a solution for the original problem.

Erickson's advice captures the mindset: assume the recursion already works, so that the recursive calls correctly solve the smaller instances, and focus your energy on the divide and combine steps. This recursion fairy¹ stance turns a single hard problem into two manageable questions: how do I split? and how do I merge?

The payoff is always a recurrence. If an instance of size $n$ spawns $a$ subproblems each of size $n / b$ , and the divide-plus-combine work costs $Θ (n^{c})$ , then the total cost obeys

T (n) = a T (n / b) + Θ (n^{c}) .

Almost every algorithm in this module is an exercise in choosing $a$ , $b$ , and $c$ wisely and then reading off $T (n)$ . The master theorem (stated at the end of this lesson) turns that reading-off into a mechanical three-case rule; the recursion tree is the picture behind it. Mergesort is the cleanest first example, so we start there.

The sorting problem, revisited

Recall the specification from the previous module:

Insertion sort grew a sorted prefix one element at a time, costing $Θ (n^{2})$ in the worst case. Divide and conquer does much better. Ask: if I already had two sorted halves, could I finish the job cheaply? The answer, yes, by merging, gives us mergesort.²

Mergesort

To sort the subarray $A [p .. r]$ , split it at the midpoint $q = ⌊ (p + r) /2 ⌋$ , recursively sort the two halves, and merge them back together. A single element ( $p \geq r$ ) is already sorted, so it is the base case.

Mergesort on

⟨ 5, 2, 4, 7, 1, 3, 2, 6 ⟩

. The top half divides (black arrows) down to singletons — the base cases; the bottom half merges (blue arrows) those sorted runs back up, pair by pair, to the final sorted list.

Algorithm 1:

\textsc{Merge-Sort}(A, p, r)

— sort

A[p..r]

in increasing order

1
if $p < r$ then
2
$q \gets \floor{(p + r) / 2}$
split point
3
call $\textsc{Merge-Sort}(A, p, q)$
sort left half
4
call $\textsc{Merge-Sort}(A, q + 1, r)$
sort right half
5
call $\textsc{Merge}(A, p, q, r)$
combine halves

All the real work lives in the combine step. $Merge$ takes two adjacent sorted runs, $A [p .. q]$ and $A [q + 1.. r]$ , and interleaves them into a single sorted run in place. It copies each half into a scratch array, then repeatedly takes the smaller of the two front elements and writes it back.

Algorithm 2:

\textsc{Merge}(A, p, q, r)

— merge sorted

A[p..q]

and

A[q+1..r]

1
$n_1 \gets q - p + 1$
2
$n_2 \gets r - q$
3
let $L[1..n_1 + 1]$ and $R[1..n_2 + 1]$ be new arrays
4
for $i \gets 1$ to $n_1$ do
5
$L[i] \gets A[p + i - 1]$
copy left half
6
for $j \gets 1$ to $n_2$ do
7
$R[j] \gets A[q + j]$
copy right half
8
$L[n_1 + 1] \gets \infty$
sentinel guards the run end
9
$R[n_2 + 1] \gets \infty$
10
$i \gets 1$
11
$j \gets 1$
12
for $k \gets p$ to $r$ do
13
if $L[i] \le R[j]$ then
14
$A[k] \gets L[i]$
15
$i \gets i + 1$
16
else
17
$A[k] \gets R[j]$
18
$j \gets j + 1$

mergesort.pypython

from typing import Protocol, TypeVar

class Comparable(Protocol):
  """
    Anything that supports `<=`; mergesort needs nothing more.\n
  """

  def __le__(self, other: object) -> bool: ...

Element = TypeVar("Element", bound=Comparable)

def merge(left: list[Element], right: list[Element]) -> list[Element]:
  """
    Interleave two already-sorted runs into one sorted list.\n
    Ties favour `left`, so equal elements keep their original order and the\n
    overall sort stays stable. Runs in O(len(left) + len(right)).\n
  """
  merged: list[Element] = []
  left_cursor: int = 0
  right_cursor: int = 0

  # take the smaller front element until one run is exhausted.
  while left_cursor < len(left) and right_cursor < len(right):
    if left[left_cursor] <= right[right_cursor]:
      merged.append(left[left_cursor])
      left_cursor += 1
    else:
      merged.append(right[right_cursor])
      right_cursor += 1

  # one run is empty; the other is already sorted, so append its tail.
  merged.extend(left[left_cursor:])
  merged.extend(right[right_cursor:])
  return merged

def mergesort(values: list[Element]) -> list[Element]:
  """
    Return a new sorted copy of `values` in increasing order.\n
    A list of zero or one element is its own base case.\n
  """
  if len(values) <= 1:
    return list(values)

  # split at the midpoint and sort each half, then merge the two runs.
  midpoint: int = len(values) // 2
  sorted_left: list[Element] = mergesort(values[:midpoint])
  sorted_right: list[Element] = mergesort(values[midpoint:])
  return merge(sorted_left, sorted_right)

The two sentinel values $\infty$ are a small but useful device: once one half is used up, its front element is forever $\infty$ , so the comparison always picks from the other half. This removes the need to test have we run out? on every iteration.

Picture the merge in flight. Two sorted runs $L$ and $R$ sit above the output; the cursors $i$ and $j$ point at their smallest uncopied elements, and $k$ marks where the next winner lands in $A$ . Each step compares $L [i]$ to $R [j]$ , writes the smaller, and advances that one cursor.

Merge step with cursors

i

and

j

on sorted runs

L

and

R

writing the smaller value into

A

k

. The last cell of each run holds the sentinel

\infty

Here $L [i] = 11$ and $R [j] = 8$ , so $R [j]$ wins: it is written to $A [k]$ and $j$ advances. The two $\infty$ sentinels guard the right ends so the comparison is always well-defined.

Why merge is correct

Merge runs in $Θ (n)$ time on $n = r - p + 1$ elements: each of the $n$ iterations of the final for loop does $O (1)$ work and advances exactly one of $i$ , $j$ . Correctness rests on a loop invariant:

For example, run the loop to completion on the two halves $⟨ 2, 4, 5, 7 ⟩$ and $⟨ 1, 2, 3, 6 ⟩$ . After the copy phase, $L = ⟨ 2, 4, 5, 7, \infty ⟩$ and $R = ⟨ 1, 2, 3, 6, \infty ⟩$ . Each row below is one iteration of the for loop: one comparison, one write, one cursor advance.

$k$	comparison	winner	$A [p .. k]$ after the write
$1$	$L [1] = 2$ vs $R [1] = 1$	$R$	$⟨ 1 ⟩$
$2$	$L [1] = 2$ vs $R [2] = 2$	$L$ (tie: $\leq$ takes left)	$⟨ 1, 2 ⟩$
$3$	$L [2] = 4$ vs $R [2] = 2$	$R$	$⟨ 1, 2, 2 ⟩$
$4$	$L [2] = 4$ vs $R [3] = 3$	$R$	$⟨ 1, 2, 2, 3 ⟩$
$5$	$L [2] = 4$ vs $R [4] = 6$	$L$	$⟨ 1, 2, 2, 3, 4 ⟩$
$6$	$L [3] = 5$ vs $R [4] = 6$	$L$	$⟨ 1, 2, 2, 3, 4, 5 ⟩$
$7$	$L [4] = 7$ vs $R [4] = 6$	$R$	$⟨ 1, 2, 2, 3, 4, 5, 6 ⟩$
$8$	$L [4] = 7$ vs $R [5] = \infty$	$L$	$⟨ 1, 2, 2, 3, 4, 5, 6, 7 ⟩$

Two rows deserve a second look. At $k = 2$ the fronts tie at $2$ ; the $\leq$ comparison takes from $L$ , the left half — the choice that makes the sort stable (more on this below). At $k = 8$ the right run is exhausted and its front is the sentinel $\infty$ , so the comparison automatically drains the rest of $L$ with no special end-of-run test. Eight iterations, eight writes, each element landing in its sorted slot:

The completed merge of sorted halves

⟨ 2, 4, 5, 7 ⟩

and

⟨ 1, 2, 3, 6 ⟩

interleaves into one sorted run.

Analyzing the cost

Let $T (n)$ be the worst-case running time of mergesort on $n$ elements. Splitting costs $Θ (1)$ , the two recursive calls cost $2 T (n /2)$ , and the merge costs $Θ (n)$ . So

T (n) = 2 T (n /2) + Θ (n), T (1) = Θ (1) .

To see why this resolves to $Θ (n log n)$ , draw the recursion tree. Each node is labeled with the non-recursive work it does, the cost of its own merge. The root merges $n$ elements; its two children each merge $n /2$ ; the next level has four nodes each merging $n /4$ ; and so on.

Recursion tree for mergesort with each node showing its merge cost, summing to

c n

per level.

Each level sums to the same amount. The root level is $c n$ ; the next is $2 \cdot c n /2 = c n$ ; the next is $4 \cdot c n /4 = c n$ ; in general level $i$ has $2^{i}$ nodes each doing $c n / 2^{i}$ work, for a row total of $c n$ .

Doubling the node count while halving each node's work keeps every level's total fixed at

c n

; the

log_{2} n + 1

rows give

Θ (n log n)

Halving from $n$ down to the base case of $1$ takes $log_{2} n$ steps, so there are $log_{2} n + 1$ levels. Multiplying the per-level cost by the number of levels:

T (n) = c n \cdot (log_{2} n + 1) = Θ (n log n) .

This is the canonical application of the master theorem ( $a = 2$ , $b = 2$ , $f (n) = Θ (n)$ , so $n^{log_{b} a} = n$ and we land in the balanced case), but the recursion tree makes the $n log n$ concrete: $log n$ levels, $n$ work apiece.

Stability

This falls out of the $\leq$ in $Merge$ : when $L [i] = R [j]$ we take from $L$ , the left (earlier) half, first. We saw it happen in the trace above, at $k = 2$ : the two front elements tied at $2$ , and the left half's copy was emitted first. Since every element of $L$ came from earlier positions in $A$ than every element of $R$ , and recursion preserves the property inductively, equal elements never swap places.

Stability on

⟨ 3_{a}, 1, 3_{b}, 2, 3_{c} ⟩

: the three equal keys (subscripts mark original order) arrive in the output in the same left-to-right order they started in.

Stability matters when records are sorted on one key but carry others: a stable sort lets you sort by secondary key, then primary key, and trust that ties on the primary preserve the secondary ordering. Sorting employees by department after sorting them by name leaves each department's roster alphabetized — but only if the second sort is stable.

Mergesort versus other sorts

Property	Mergesort	Insertion sort	Heapsort	Quicksort
Worst case	$Θ (n log n)$	$Θ (n^{2})$	$Θ (n log n)$	$Θ (n^{2})$
Average case	$Θ (n log n)$	$Θ (n^{2})$	$Θ (n log n)$	$Θ (n log n)$
Extra space	$Θ (n)$	$Θ (1)$	$Θ (1)$	$Θ (log n)$
Stable	yes	yes	no	no
In place	no	yes	yes	yes

Mergesort's worst-case guarantee and stability make it the sort of choice when predictability matters or when data does not fit in memory. Its sequential, merge-based access pattern is ideal for sorting linked lists and for external sorting of data streamed from disk.³ Its cost is the $Θ (n)$ auxiliary array. Quicksort, the subject of the next lesson, trades that guarantee for better constants and in-place operation.

When the recursion is not worth it

Divide and conquer wins asymptotically, but each recursive call carries real overhead: stack frames, index arithmetic, the scratch-array traffic of $Merge$ . On a subarray of ten elements, insertion sort's tight loop with no allocation beats all of that machinery outright. Two standard adjustments exploit this.

Cut off to insertion sort. Stop recursing once the subarray shrinks below a threshold $k$ and finish it with insertion sort. The $n / k$ base cases cost $Θ (k^{2})$ each, for $Θ (nk)$ total, while the merging now spans only $log (n / k)$ levels of $Θ (n)$ work apiece:

T (n) = Θ (nk + n log (n / k)) .

For constant $k$ this is still $Θ (n log n)$ — the asymptotics are untouched — but the constant factor drops because the bottom $log k$ levels of the recursion tree, the levels with the most nodes and the most per-call overhead, are replaced by a handful of cheap quadratic sorts. In practice $k$ is tuned somewhere between $8$ and $32$ .

Go bottom-up. The recursion can be removed entirely. Bottom-up mergesort treats the array as $n$ sorted runs of width $1$ , then makes passes that merge adjacent runs pairwise: after the first pass the runs have width $2$ , then $4$ , then $8$ , doubling until one run remains.

Bottom-up mergesort on

⟨ 5, 2, 4, 7, 1, 3, 2, 6 ⟩

: each pass merges adjacent runs pairwise, doubling the run width, with no recursion at all.

Each pass is a plain loop over the array doing $Θ (n)$ merge work, and there are $⌈ log_{2} n ⌉$ passes, so the cost is the same $Θ (n log n)$ — the recursion tree read bottom-to-top instead of top-to-bottom. What iteration buys is engineering: no stack, no function-call overhead, and a shape that suits linked lists (splice runs instead of copying) and external sorting, where each pass is one sequential sweep over the data on disk. What it gives up is the cutoff trick's easy hybridization and any chance to exploit runs that are already sorted — refinements that top-down and bottom-up variants alike can bolt back on.

The broader moral: divide and conquer sets the asymptotic ceiling, but at small sizes a simple iterative method with better constants wins, so real implementations are hybrids — recursion (or doubling passes) for the large scales, iteration for the base.

Counting inversions

Here is a problem that has nothing to do with sorting on its surface, yet falls to the very machinery we just built. Given a list $⟨ a_{1}, a_{2}, \dots, a_{n} ⟩$ , how close to sorted is it? A natural measure counts the pairs that are out of order.

A sorted array has zero inversions; a reverse-sorted one has the maximum, $(2 n)$ . (Inversion counts also drive collaborative-filtering how similar are two rankings? scores.) The brute-force algorithm loops over all pairs and counts the bad ones, costing exactly $(2 n) = \frac{1}{2} n (n - 1) = Θ (n^{2})$ comparisons. We can do far better.

Idea 0: divide and conquer, just like mergesort. Split $A$ into a left half $B$ and a right half $C$ . Every inversion is one of three kinds:

both endpoints in $B$ , counted by recursing on $B$ ;
both endpoints in $C$ , counted by recursing on $C$ ;
one endpoint in each: a cross inversion, $i$ on the left and $j$ on the right with $B [i] > C [j]$ .

A cross inversion linking an element in left half

B

to a smaller element in right half

C

Counting cross inversions with a double loop costs $Θ (n^{2})$ for the combine step, giving $T (n) = 2 T (n /2) + Θ (n^{2})$ , which the master theorem resolves to $Θ (n^{2})$ , no gain. The combine step is the bottleneck.

Idea 1: count cross inversions during a merge. Suppose the two halves arrive already sorted. Walk them with two cursors exactly as $Merge$ does. When we are about to emit and $B [i] > C [j]$ , the element $C [j]$ is smaller than $B [i]$ and than everything after it in $B$ , so $C [j]$ forms an inversion with all $p - i + 1$ remaining elements of $B$ at once. Add that count, emit $C [j]$ , and move on.

This batching is why the count collapses to linear time: a single comparison reveals $p - i + 1$ inversions, not one. Because $B$ is sorted, every element from $B [i]$ onward exceeds $C [j]$ , so each is inverted with it.

When the merge finds

B [i] > C [j]

, sortedness of

B

means every remaining

B [i .. p]

also exceeds

C [j]

— so emitting

C [j]

adds

p - i + 1

cross inversions in one stroke.

Algorithm 3:

\textsc{Count-Cross-Inv}(B[1..p], C[1..q])

— cross inversions,

B, C

sorted

1
$\mathit{ans} \gets 0$
2
$i \gets 1$
3
$j \gets 1$
4
while $i \le p$ and $j \le q$ do
5
if $B[i] \le C[j]$ then
6
$i \gets i + 1$
no inversion
7
else
8
$\mathit{ans} \gets \mathit{ans} + (p - i + 1)$
$C[j]$ inverts with $B[i..p]$
9
$j \gets j + 1$
10
return $\mathit{ans}$

This runs in $Θ (p + q) = Θ (n)$ , the linear merge pattern. But it demands sorted halves, so we must sort them first: sorting $B$ and $C$ costs an extra $Θ (n log n)$ per level, and there are $log n$ levels, giving $T (n) = 2 T (n /2) + Θ (n log n) = Θ (n log^{2} n)$ . Better than quadratic, but the repeated sorting is wasteful.

Idea 2: sort and count in one pass. We are doing almost all of mergesort's work anyway, so let the recursion return both the inversion count and a sorted copy of its slice. Then the cross-counting merge also produces the sorted output the parent needs, for free.

Algorithm 4:

\textsc{Sort-And-Count-Inv}(A, \mathit{lo}, \mathit{hi})

— sort

A[\mathit{lo}..\mathit{hi}]

, return its inversion count

1
if $\mathit{hi} \le \mathit{lo}$ then
2
return $0$
single element: no inversions
3
$t \gets \floor{(\mathit{lo} + \mathit{hi}) / 2}$
4
$c \gets \textsc{Sort-And-Count-Inv}(A, \mathit{lo}, t)$
left inversions + sort left
5
$c \gets c + \textsc{Sort-And-Count-Inv}(A, t + 1, \mathit{hi})$
right inversions + sort right
6
$c \gets c + \textsc{Count-Cross-Inv-And-Merge}(A, \mathit{lo}, t, \mathit{hi})$
cross + merge
7
return $c$

count_inversions.pypython

from typing import TypeVar

from comparable import Comparable

Element = TypeVar("Element", bound=Comparable)

def _merge_and_count(
  left: list[Element],
  right: list[Element],
) -> tuple[list[Element], int]:
  """
    Merge two sorted runs and count the cross inversions between them.\n
    When `right[right_cursor]` is the smaller front element, every element\n
    still pending in `left` exceeds it, so all of them are added at once.\n
  """
  merged: list[Element] = []
  left_cursor: int = 0
  right_cursor: int = 0
  cross_inversions: int = 0

  # emit the smaller front element; tie favours left to stay stable.
  while left_cursor < len(left) and right_cursor < len(right):
    if not (right[right_cursor] < left[left_cursor]):
      merged.append(left[left_cursor])
      left_cursor += 1
    else:
      # right element is smaller than every left element still pending.
      cross_inversions += len(left) - left_cursor
      merged.append(right[right_cursor])
      right_cursor += 1

  # one run is drained; the other's tail is already sorted.
  merged.extend(left[left_cursor:])
  merged.extend(right[right_cursor:])
  return merged, cross_inversions

def _sort_and_count(values: list[Element]) -> tuple[list[Element], int]:
  """
    Return a sorted copy of `values` together with its inversion count.\n
    Left and right inversions come from the recursive calls; cross\n
    inversions come from the merge that stitches the halves together.\n
  """
  if len(values) <= 1:
    return list(values), 0

  # sort each half; the recursion tallies the inversions within each.
  midpoint: int = len(values) // 2
  sorted_left, left_inversions = _sort_and_count(values[:midpoint])
  sorted_right, right_inversions = _sort_and_count(values[midpoint:])

  # merge the halves, adding the cross inversions to both sides' counts.
  merged, cross_inversions = _merge_and_count(sorted_left, sorted_right)
  return merged, left_inversions + right_inversions + cross_inversions

def count_inversions(values: list[Element]) -> int:
  """
    The number of pairs (i, j) with i < j and values[i] > values[j].\n
    A sorted list has zero; a reverse-sorted one has the maximum n(n-1)/2.\n
  """
  _, inversions = _sort_and_count(values)
  return inversions

comparable.pypython

from typing import Any, Protocol, TypeVar


class Comparable(Protocol):
  """
    Anything orderable with `<` (int, float, str, tuple, date, …).\n
  """

  # `other` is position-only so built-ins (int, str, …), whose dunder
  # operands are position-only, structurally satisfy the protocol.
  def __lt__(self, other: Any, /) -> bool: ...
  def __gt__(self, other: Any, /) -> bool: ...
  def __le__(self, other: Any, /) -> bool: ...
  def __ge__(self, other: Any, /) -> bool: ...

The helper $Count-Cross-Inv-And-Merge$ is just $Merge$ with the counting rule from $Count-Cross-Inv$ folded in: whenever it takes from the right half because $A [mid + j] < A [i]$ , it adds the number of elements still waiting in the left half. The combine step is now plain linear, so

T (n) = 2 T (n /2) + Θ (n) = Θ (n log n) .

This is the same recurrence as mergesort, and the same recursion tree explains it: $log n$ levels, $Θ (n)$ work each. Counting how disordered a list is costs no more, asymptotically, than sorting it.

A worked count

Run the algorithm on $A = ⟨ 2, 4, 1, 3, 5 ⟩$ , whose inversions are $(2, 1)$ , $(4, 1)$ , and $(4, 3)$ — three in all. The split gives $B = ⟨ 2, 4 ⟩$ and $C = ⟨ 1, 3, 5 ⟩$ . Both halves are already sorted, so the recursive calls return $0$ and $0$ , and everything rides on the counting merge:

step	fronts	action	count added	running total
$1$	$B : 2$ vs $C : 1$	emit $1$ from $C$	$2$ (both of $⟨ 2, 4 ⟩$ exceed $1$ )	$2$
$2$	$B : 2$ vs $C : 3$	emit $2$ from $B$	$0$	$2$
$3$	$B : 4$ vs $C : 3$	emit $3$ from $C$	$1$ (only $4$ remains in $B$ )	$3$
$4$	$B : 4$ vs $C : 5$	emit $4$ from $B$	$0$	$3$
$5$	$B$ empty	emit $5$ from $C$	$0$	$3$

Total: $0 + 0 + 3 = 3$ , matching the hand count, and the array leaves the merge sorted as $⟨ 1, 2, 3, 4, 5 ⟩$ , ready for use by the parent call. Step $1$ is the batching in action: one comparison charged two inversions, because sortedness of $B$ guarantees every element from its cursor onward exceeds the emitted value.

Beyond sorting: faster multiplication

Sorting is not the only home for divide and conquer. The same paradigm beats the grade-school $Θ (n^{2})$ algorithm for multiplying large integers (Karatsuba, three half-size products instead of four, $Θ (n^{log_{2} 3})$ ) and the cubic schoolbook algorithm for multiplying matrices (Strassen, seven block products instead of eight, $Θ (n^{log_{2} 7})$ ). Both spend cheap additions to buy back an expensive multiplication, and both fall straight out of the master theorem below. We give them a lesson of their own: Fast Multiplication.

The master theorem

Every recurrence in this lesson has the form $T (n) = a T (n / b) + Θ (n^{c})$ . The recursion-tree analysis we did by hand each time generalizes to a single rule. Compare the branching exponent $log_{b} a$ , the rate at which leaves proliferate, against the work exponent $c$ :

T (n) = a T (n / b) + Θ (n^{c}) ⟹ T (n) = ⎩ ⎨ ⎧ Θ (n^{log_{b} a}) Θ (n^{c} log n) Θ (n^{c}) if log_{b} a > c if log_{b} a = c if log_{b} a < c (leaf-heavy) (balanced) (root-heavy) .

The three cases correspond to the three shapes of recursion tree: when $log_{b} a > c$ the rows grow toward the leaves (Karatsuba), when they are equal every row costs the same (mergesort), and when $log_{b} a < c$ the root's work dominates. Reading off our examples:

Algorithm	Recurrence	$a$	$b$	$c$	$log_{b} a$ vs $c$	$T (n)$
Mergesort	$2 T (n /2) + Θ (n)$	$2$	$2$	$1$	$1 = 1$ , balanced	$Θ (n log n)$
Counting inversions	$2 T (n /2) + Θ (n)$	$2$	$2$	$1$	$1 = 1$ , balanced	$Θ (n log n)$
Inversions, naive combine	$2 T (n /2) + Θ (n^{2})$	$2$	$2$	$2$	$1 < 2$ , root-heavy	$Θ (n^{2})$
Karatsuba	$3 T (n /2) + Θ (n)$	$3$	$2$	$1$	$log_{2} 3 > 1$ , leaf-heavy	$Θ (n^{log_{2} 3})$

master_theorem.pypython

import math
from enum import Enum
from typing import NamedTuple

class Regime(Enum):
  """
    Which term of the recursion tree dominates the total cost.\n
  """
  LEAF_HEAVY = "leaf-heavy"
  BALANCED = "balanced"
  ROOT_HEAVY = "root-heavy"

class MasterSolution(NamedTuple):
  """
    The solved recurrence: its regime, the dominating exponent, whether a\n
    log factor appears, and a human-readable Theta(...) description.\n
  """
  regime: Regime
  exponent: float
  has_log_factor: bool
  asymptotic: str

def _format_exponent(exponent: float) -> str:
  """
    Render an exponent compactly: `n`, `1`, or `n^k` with tidy decimals.\n
  """
  # special-case the bare exponents 0 and 1 for a cleaner label.
  if math.isclose(exponent, 0.0, abs_tol=1e-9):
    return "1"
  if math.isclose(exponent, 1.0, abs_tol=1e-9):
    return "n"

  # otherwise trim trailing zeros and render as n^k.
  text: str = f"{round(exponent, 4):g}"
  return f"n^{text}"

def master_theorem(
  subproblems: int,
  shrink_factor: float,
  work_exponent: float,
) -> MasterSolution:
  """
    Solve T(n) = a*T(n/b) + Theta(n^c) for `a` subproblems, shrink factor\n
    `b`, and combine-work exponent `c`. Requires a >= 1 and b > 1.\n
    Returns the regime, the branching/work exponent that wins, and the\n
    Theta(...) bound as a string.\n
  """
  # reject recurrences the theorem can't classify.
  if subproblems < 1:
    raise ValueError("subproblems (a) must be at least 1")
  if shrink_factor <= 1:
    raise ValueError("shrink_factor (b) must be greater than 1")

  # log_b(a) is the rate leaves proliferate; compare it against c.
  branching_exponent: float = math.log(subproblems) / math.log(shrink_factor)

  # leaves win: cost is dominated by the bottom of the recursion tree.
  if branching_exponent > work_exponent + 1e-9:
    regime: Regime = Regime.LEAF_HEAVY
    exponent: float = branching_exponent
    has_log_factor: bool = False

  # balanced: every level costs the same, so a log factor appears.
  elif math.isclose(branching_exponent, work_exponent, abs_tol=1e-9):
    regime = Regime.BALANCED
    exponent = work_exponent
    has_log_factor = True

  # root wins: cost is dominated by the top-level combine work.
  else:
    regime = Regime.ROOT_HEAVY
    exponent = work_exponent
    has_log_factor = False

  # assemble the Theta(...) bound, folding in the log factor if balanced.
  body: str = _format_exponent(exponent)
  asymptotic: str = (
    f"Theta({body} log n)" if has_log_factor else f"Theta({body})"
  )
  return MasterSolution(regime, exponent, has_log_factor, asymptotic)

One last sanity check: it makes no difference whether the combine cost is written $Θ (n)$ or bounded above by $O (n)$ . The recurrences $T (n) = 2 T (n /2) + Θ (n)$ and $T (n) \leq 2 T (n /2) + O (n)$ have the same solution. The master theorem depends only on $a$ , $b$ , and the exponent $c$ .

The sort real programs call

Mergesort's clean structure and stability make it the base for the sort that most real programs actually call.

Timsort: exploit the runs already there. The default sort in Python's list and Java's Arrays.sort for objects is Timsort (Tim Peters, 2002), an adaptive, stable mergesort. Real data is rarely random: it arrives with long stretches already ascending or descending — a log file appended over time, a list re-sorted after a few edits. Timsort scans for these natural runs first, reversing descending ones in place, and only merges the runs it finds, so an already-sorted array costs a single $Θ (n)$ pass instead of $Θ (n log n)$ . It extends short runs with an insertion sort up to a minimum length, and it merges runs under a stack invariant that keeps run lengths balanced (the invariant had a famous bug, found in 2015 by researchers formally verifying the merge policy, that could overflow the merge stack — since fixed). The through-line is the bottom-up idea from earlier, made adaptive: instead of blindly doubling from width $1$ , start from the runs already present in the input.

Merging in parallel. The recursion tree's independent subproblems make mergesort a natural fit for multiple cores: the two recursive sorts run on separate threads, and the join waits for both. But a naive parallel mergesort is bottlenecked by its sequential $Θ (n)$ merge at the root. The fix is a parallel merge: to merge two sorted halves, binary-search the median of one into the other to split both into balanced pieces that merge independently, recursively. This drops the span (critical-path length) to $Θ (log^{2} n)$ while keeping the work $Θ (n log n)$ , the design behind the parallel sorts in libraries like Intel TBB and the C++17 parallel std::sort. Mergesort's sequential, predictable access pattern — the same property that suits linked lists and disk — also lets it carve cleanly across cores.³

Takeaways

Divide and conquer = divide into smaller copies, conquer recursively, combine. Trust the recursion; focus on the split and the merge. The cost is always a recurrence $T (n) = a T (n / b) + Θ (n^{c})$ .
$Mergesort$ divides at the midpoint and combines with a linear-time $Merge$ whose correctness is a clean loop-invariant argument.
The recurrence $T (n) = 2 T (n /2) + Θ (n)$ unfolds into a recursion tree with $log n$ levels of $Θ (n)$ work each, giving $Θ (n log n)$ .
Mergesort is stable and worst-case optimal among comparison sorts, at the cost of $Θ (n)$ extra space, ideal for linked lists and external sorting.
At small sizes the recursion's overhead loses to plain iteration: real implementations cut off to insertion sort below a threshold ( $Θ (nk + n log (n / k))$ ) or run bottom-up, merging width- $1, 2, 4, \dots$ runs with no recursion at all.
Counting inversions reuses the merge: fold a cross-inversion count into $Merge$ so each step adds $p - i + 1$ , sorting and counting together in $Θ (n log n)$ instead of the brute-force $Θ (n^{2})$ .
The same machinery beats grade-school arithmetic — see Fast Multiplication for Karatsuba ( $Θ (n^{log_{2} 3})$ ) and Strassen ( $Θ (n^{log_{2} 7})$ ).
The master theorem turns the tree into a rule: compare $log_{b} a$ to $c$ for leaf-heavy, balanced, or root-heavy behavior.⁴

Erickson, Algorithms, Ch. 1 — Recursion: the recursion fairy stance of assuming recursive calls already work and focusing on divide and combine. ↩
CLRS, Ch. 2 (§2.3) — Designing algorithms: mergesort as the canonical divide-and-conquer sort built on a linear-time merge. ↩
Skiena, The Algorithm Design Manual, §4 — Sorting and Searching: mergesort's stability and suitability for linked lists and external sorting. ↩ ↩²
CLRS, Ch. 4 — Divide-and-Conquer: the master theorem comparing $log_{b} a$ against the work exponent $c$ to classify leaf-heavy, balanced, and root-heavy recurrences. ↩

The sorting problem, revisited

Mergesort

Why merge is correct

Analyzing the cost

Stability

Mergesort versus other sorts

When the recursion is not worth it

Counting inversions

A worked count

Beyond sorting: faster multiplication

The master theorem

The sort real programs call

Takeaways

Footnotes