Lower Bounds for Comparison Sorting

Mergesort, heapsort, and (in expectation) quicksort all run in $Θ (n log n)$ . Insertion sort and the rest do worse. None does better. Could some cleverer algorithm beat $n log n$ ? The answer is no, not if it learns about the input only by comparing elements. This lesson proves a matching lower bound: any comparison sort needs $Ω (n log n)$ comparisons in the worst case.¹ This is one of the rare cases where we can prove a problem hard, pinning its complexity from both sides.

The comparison model

A comparison sort determines the sorted order using only comparisons between pairs of input elements. The only questions it may ask are of the form is $a_{i} \leq a_{j}$ ? (or $<$ , $\geq$ , $>$ , $=$ ). It never inspects the elements' values; it cannot read a digit, hash a key, or use one as an array index. All of insertion, merge, quick, and heap sort live in this model, and so does almost every general-purpose sort.

This restriction is what makes a lower bound possible. Because the only information the algorithm extracts is the outcomes of comparisons, we can account for everything it could possibly learn by counting those outcomes. The elements' actual values are irrelevant beyond their relative order, so we may assume without loss of generality that the input is some permutation of the distinct values ${1, 2, \dots, n}$ .

A sort is a decision tree

Fix the number of elements $n$ and fix a comparison sort. We can model its entire behavior as a decision tree: a binary tree whose internal nodes are comparisons and whose leaves are answers.²

Each internal node is labeled $i : j$ , meaning compare $a_{i}$ with $a_{j}$ .
Its two outgoing edges are the two outcomes, $a_{i} \leq a_{j}$ and $a_{i} > a_{j}$ . The algorithm follows the edge matching the actual data.
Each leaf is labeled with a permutation $⟨ π (1), π (2), \dots, π (n) ⟩$ , the sorted order the algorithm declares once it reaches that leaf.

Running the algorithm on a particular input traces a single root-to-leaf path: at each node the data picks the branch, and the leaf reached announces the ordering. The structure of the tree depends only on $n$ , not on the input; the input merely chooses which path through the fixed tree gets walked.

Here is the decision tree for sorting three elements $⟨ a_{1}, a_{2}, a_{3} ⟩$ ; each leaf names the order from smallest to largest.

Decision tree for comparison sorting three elements, with permutation leaves.

Trace the input $⟨ a_{1}, a_{2}, a_{3} ⟩ = ⟨ 6, 2, 9 ⟩$ . At the root we ask $a_{1} : a_{2}$ , that is $6 : 2$ ; since $6 > 2$ we branch right. Next $a_{2} : a_{3}$ , i.e. $2 : 9$ , so $\leq$ sends us left. Finally $a_{1} : a_{3}$ , i.e. $6 : 9$ , again $\leq$ , landing at the leaf $213$ , meaning $a_{2} \leq a_{1} \leq a_{3}$ , which reads off as $2 \leq 6 \leq 9$ . Correct.

The path the input

⟨ 6, 2, 9 ⟩

walks through the tree. Each comparison's outcome (annotated with the concrete values) picks one branch; the untaken subtrees (grey) are never touched. Three comparisons, one leaf.

Notice what the tree does not record: the values themselves. The inputs $⟨ 6, 2, 9 ⟩$ , $⟨ 5, 1, 8 ⟩$ , and $⟨ 200, - 3, 417 ⟩$ all walk the same path, because they present the same pattern of comparison outcomes. The tree partitions all possible inputs into finitely many equivalence classes, one per leaf, and that finiteness is what we now exploit.

comparison_decision_tree.pypython

from itertools import permutations
from typing import Callable, Optional

class DecisionNode:
  """
    One node of a comparison decision tree.\n
    An internal node holds the pair of *original* element indices it compares,\n
    with `left` taken when element[first] <= element[second] and `right`\n
    otherwise. A leaf holds the output ordering as a tuple of original\n
    indices: the element that ends up smallest, then next-smallest, and so on.\n
  """

  def __init__(self) -> None:
    # internal nodes carry the compared pair plus the two outcome branches
    self.comparison: Optional[tuple[int, int]] = None
    self.left: Optional[DecisionNode] = None
    self.right: Optional[DecisionNode] = None

    # a leaf instead carries the output ordering
    self.permutation: Optional[tuple[int, ...]] = None

  @property
  def is_leaf(self) -> bool:
    """
      Whether this node names an output ordering rather than a comparison.\n
    """
    return self.permutation is not None

# A comparison sort works over a list of *original element indices* and asks
# its questions through a "less-than-or-equal" callback keyed by those indices:
# less_or_equal(i, j) answers "is the original element i <= original element j?"
# Keying on stable original indices (never shifting array positions) is what
# makes each comparison a well-defined decision-tree label. The sort reorders
# its index list in place so that, on return, it spells the sorted order.
Comparator = Callable[[int, int], bool]
ComparisonSort = Callable[[list[int], Comparator], None]

class ComparisonDecisionTree:
  """
    The decision tree induced by a comparison sort at a fixed input size.\n
    We run the sort against every one of the n! permutations of\n
    {0, 1, ..., n-1}. Each run traces a root-to-leaf path; recording the\n
    comparisons made and the order produced reconstructs the whole tree.\n
  """

  def __init__(self, sort: ComparisonSort, size: int) -> None:
    self.size: int = size
    self.root: DecisionNode = DecisionNode()
    self._build(sort)

  def _build(self, sort: ComparisonSort) -> None:
    """
      Drive `sort` over all n! inputs, grafting each trace onto the tree.\n
    """
    for ordering in permutations(range(self.size)):
      # this run's element values, and the comparison trace it will produce
      element: tuple[int, ...] = ordering
      path: list[tuple[int, int, bool]] = []

      # answer each query against this run's values and log it onto the path
      def less_or_equal(first: int, second: int) -> bool:
        outcome: bool = element[first] <= element[second]
        path.append((first, second, outcome))
        return outcome

      # sort the identity index list in place; the result is the output ordering
      index_order: list[int] = list(range(self.size))
      sort(index_order, less_or_equal)
      self._graft(path, tuple(index_order))

  def _graft(
    self,
    path: list[tuple[int, int, bool]],
    output_permutation: tuple[int, ...],
  ) -> None:
    """
      Walk `path` from the root, creating nodes as needed, and label the\n
      final node with `output_permutation`.\n
    """
    # follow the trace from the root, creating each branch the first time
    node: DecisionNode = self.root
    for first, second, outcome in path:
      node.comparison = (first, second)
      if outcome:
        if node.left is None:
          node.left = DecisionNode()
        node = node.left
      else:
        if node.right is None:
          node.right = DecisionNode()
        node = node.right

    # the landing node is this input's leaf
    node.permutation = output_permutation

  def height(self) -> int:
    """
      The length of the longest root-to-leaf path — the sort's worst-case\n
      comparison count at this input size.\n
    """

    def depth(node: Optional[DecisionNode]) -> int:
      if node is None or node.is_leaf:
        return 0
      return 1 + max(depth(node.left), depth(node.right))

    return depth(self.root)

  def leaves(self) -> list[DecisionNode]:
    """
      Every reachable leaf, in left-to-right order.\n
    """
    found: list[DecisionNode] = []

    def walk(node: Optional[DecisionNode]) -> None:
      if node is None:
        return
      if node.is_leaf:
        found.append(node)
        return
      walk(node.left)
      walk(node.right)

    walk(self.root)
    return found

  def leaf_count(self) -> int:
    """
      How many distinct outputs the tree can reach (>= n! when correct).\n
    """
    return len(self.leaves())

  def sort_with_tree(self, values: list[int]) -> list[int]:
    """
      Sort `values` by tracing the tree, using only the recorded comparisons.\n
      Demonstrates that the tree alone — no access to element values beyond\n
      "a_i <= a_j?" — suffices to reproduce the sort's output.\n
    """
    # branch on each recorded comparison until we reach a leaf
    node: DecisionNode = self.root
    while not node.is_leaf:
      assert node.comparison is not None
      first, second = node.comparison
      if values[first] <= values[second]:
        assert node.left is not None
        node = node.left
      else:
        assert node.right is not None
        node = node.right

    # the leaf's ordering names where each value belongs
    assert node.permutation is not None
    return [values[position] for position in node.permutation]

Counting leaves and bounding height

Two observations turn this picture into a theorem.

The contrapositive is worth seeing concretely. Suppose the inputs $⟨ 1, 2, 3 ⟩$ and $⟨ 2, 1, 3 ⟩$ both ended at a leaf labeled $123$ , i.e. declare $a_{1} \leq a_{2} \leq a_{3}$ . For the first input that is right. For the second it asserts $2 \leq 1$ , which is false: the algorithm has mis-sorted. A leaf commits to one answer, so each of the $n!$ answers the adversary might require needs a leaf of its own.

Why leaves cannot be shared. If both inputs reached the leaf that declares

a_{1} \leq a_{2} \leq a_{3}

, the verdict would be correct for

⟨ 1, 2, 3 ⟩

but would assert

2 \leq 1

for

⟨ 2, 1, 3 ⟩

— a wrong output.

Now we connect the two. A binary tree of height $h$ has at most $2^{h}$ leaves (the count at most doubles each level). Combining with the leaf count above,

n! \leq (number of leaves) \leq 2^{h},

and taking $log_{2}$ of both ends gives the key inequality

h \geq log_{2} (n!) .

So every comparison sort, whatever its strategy, must make at least $log_{2} (n!)$ comparisons on its worst input.

Check it against the tree we drew. For $n = 3$ there are $3! = 6$ orderings, and $log_{2} 6 \approx 2.58$ , so the height must be at least $⌈ 2.58 ⌉ = 3$ . Two comparisons cannot suffice: a tree of height $2$ has at most $2^{2} = 4$ leaves, and $4 < 6$ . Our tree has height exactly $3$ , so for three elements it is optimal: no comparison sort does better, and the bound is met with equality. It remains to see how large $log_{2} (n!)$ grows in general.

The squeeze. A height-

h

binary tree holds at most

2^{h}

leaves (capacity, doubling per level), while correctness demands at least

n!

leaves (the floor). Forcing

2^{h} \geq n!

gives

h \geq log_{2} (n!)

sorting_lower_bound.pypython

import math


def min_comparisons(size: int) -> int:
  """
    The information-theoretic floor on worst-case comparisons to sort\n
    `size` elements: ceil(log2(size!)). This is the smallest height a\n
    binary tree with >= size! leaves can have.\n
  """
  if size < 0:
    raise ValueError("size must be non-negative")
  if size <= 1:
    return 0

  # sum log2(k) over k = 2..n (avoids a huge factorial), then round up
  log2_factorial: float = sum(math.log2(factor) for factor in range(2, size + 1))
  return math.ceil(log2_factorial)

$log_{2} (n!)$ is $Ω (n log n)$

We need a lower bound on $log_{2} (n!)$ . Stirling's approximation gives the sharp estimate

n! = 2 π n (\frac{n}{e})^{n} (1 + Θ (\frac{1}{n})),

from which, taking logarithms,

log_{2} (n!) = n log_{2} n - n log_{2} e + Θ (log n) = n log_{2} n - Θ (n) .

The leading term is $n log_{2} n$ , so $log_{2} (n!) = Ω (n log n)$ .⁴

If Stirling seems like a heavy tool, an elementary argument reaches the same conclusion. Drop the smallest half of the factors in $n!$ and bound each survivor below by $n /2$ :

n! = n \cdot (n - 1) \dots 2 \cdot 1 \geq n /2 factors \frac{n}{2} \cdot \frac{n}{2} \dots \frac{n}{2} = (\frac{n}{2})^{n /2} .

The picture for $n = 8$ : keep only the larger half of the factors (each at least $n /2 = 4$ ) and throw the rest away. Half the factors, each $\geq n /2$ , already force $n! \geq (n /2)^{n /2}$ .

The elementary bound for

n = 8

. Of the factors

8, 7, \dots, 1

, keep the larger half (top, each

\geq n /2 = 4

) and drop the smaller half (grey); the kept factors alone give

n! \geq (n /2)^{n /2}

Taking $log_{2}$ ,

log_{2} (n!) \geq \frac{n}{2} log_{2} \frac{n}{2} = \frac{n}{2} (log_{2} n - 1) = Ω (n log n) .

Either way the height satisfies $h \geq log_{2} (n!) = Ω (n log n)$ . (For intuition on tightness, mergesort's $Θ (n log n)$ shows the bound is achieved up to constants, so $log_{2} (n!) = Θ (n log n)$ exactly.)

How tight is it, concretely?

The bound $⌈ log_{2} (n!)⌉$ is not just asymptotically right; for small $n$ it is close to what real algorithms achieve. Mergesort's worst-case comparison count obeys $W (n) = W (⌊ n /2 ⌋) + W (⌈ n /2 ⌉) + n - 1$ with $W (1) = 0$ , and lining it up against the bound:

$n$	$n!$	$⌈ log_{2} (n!)⌉$	mergesort worst case $W (n)$
$3$	$6$	$3$	$3$
$4$	$24$	$5$	$5$
$5$	$120$	$7$	$8$
$8$	$40, 320$	$16$	$17$
$16$	$\approx 2.09 \times 1 0^{13}$	$45$	$49$

For $n = 3$ and $n = 4$ mergesort meets the information bound exactly. From $n = 5$ a gap opens: the bound says $7$ , mergesort spends $8$ . The bound is a floor: it guarantees no algorithm goes below it but does not promise an algorithm achieving it, and for most $n$ the best known sorting procedures sit slightly above $⌈ log_{2} (n!)⌉$ . Asymptotically the gap is only $Θ (n)$ out of $Θ (n log n)$ , which is why mergesort and heapsort count as optimal.

sorting_lower_bound.pypython

import math


def log2_factorial(size: int) -> float:
  """
    log2(size!) computed by summing log2(k), exact for the factors and free\n
    of factorial overflow. This is the quantity the height is bounded below\n
    by.\n
  """
  if size < 0:
    raise ValueError("size must be non-negative")
  return sum(math.log2(factor) for factor in range(2, size + 1))


def stirling_log2_factorial(size: int) -> float:
  """
    Stirling's estimate of log2(size!):\n
        n*log2(n) - n*log2(e) + 0.5*log2(2*pi*n).\n
    Matches `log2_factorial` to within o(n) and exposes the leading\n
    n*log2(n) term that drives the Omega(n log n) bound.\n
  """
  if size <= 1:
    return 0.0

  # convert the natural-log Stirling terms into bits
  bits_per_nat: float = math.log2(math.e)
  return (
    size * math.log2(size)
    - size * bits_per_nat
    + 0.5 * math.log2(2.0 * math.pi * size)
  )


def elementary_lower_bound(size: int) -> float:
  """
    The elementary floor (n/2)*log2(n/2) on log2(size!), obtained by keeping\n
    the larger half of the factors of n! and bounding each below by n/2.\n
    Always <= log2_factorial(size).\n
  """
  if size <= 1:
    return 0.0

  # keep the larger half of n!'s factors, each bounded below by n/2
  half: float = size / 2.0
  return half * math.log2(half)

The bound survives averaging and randomness

A worst-case bound leaves two possible outs: an algorithm slow on a few pathological inputs but fast on average, or a randomized algorithm, as with quicksort. Neither helps here: the $Ω (n log n)$ bound holds for the average input and for randomized algorithms too.⁵

The worst case bounded the deepest leaf; this bounds the average leaf, and the answer is the same $Ω (n log n)$ . Intuitively, a binary tree that must reach $n!$ distinct leaves cannot make more than a vanishing fraction of them shallow: balancing the tree perfectly is the best possible arrangement, and even then every leaf sits at depth about $log_{2} (n!)$ .

Average depth is minimized by balance. Both trees have

4

leaves; the balanced tree (left) has every leaf at depth

2

, average

2

, while the lopsided tree (right) averages

(1 + 2 + 3 + 3) /4 = 2.25

. No shape with

L

leaves beats average depth

log_{2} L

Randomization fares no better. A randomized comparison sort is a probability distribution over deterministic decision trees, one tree per possible sequence of coin flips. Feed it a uniformly random permutation: whichever tree the coins select, that tree's expected comparison count on random input is at least $log_{2} (n!)$ by the theorem above, so the expectation over both coins and input is also at least $log_{2} (n!)$ . There must then exist a fixed input on which the randomized algorithm's expected count is $\geq log_{2} (n!)$ . Coin flips can smooth out which inputs are bad, as they do for quicksort's pivots, but they do not change the number of comparisons required.⁵

The conclusion

Because comparisons are a lower bound on total work, no comparison-based algorithm can sort $n$ elements in worst-case time $o (n log n)$ . $Heapsort$ and mergesort are therefore asymptotically optimal: they match the lower bound, and no comparison sort can beat them by more than a constant factor.

Two boundaries mark off what this argument does and does not say.

It is an information-theoretic bound. The algorithm must distinguish $n!$ possible answers, and each comparison is a single yes/no question yielding one bit; $⌈ log_{2} (n!)⌉$ bits are needed to identify one answer among $n!$ . The lower bound counts questions, independent of any cleverness in choosing them.

One comparison, one bit. Each yes/no comparison at best halves the set of still-possible orderings; starting from

n!

candidates, narrowing to a single one takes

\geq ⌈ log_{2} (n!)⌉

comparisons.

It bounds only comparison sorts. The proof leans entirely on the restriction that information arrives one comparison at a time. An algorithm that learns about elements another way, by reading their digits or using them as array indices, sits outside the model, and the $n log n$ floor simply does not apply to it.⁶ The next lesson exploits exactly this loophole to sort in linear time.

Other lower-bound techniques

The leaf-counting argument is one member of a larger family of lower-bound techniques. Three directions are worth knowing, because they show both how far the idea reaches and where it stops.

Adversary arguments. Counting leaves bounds the tree; an adversary argument bounds an algorithm directly by playing the role of a malicious input that answers each comparison in whichever way keeps the most work ahead. The adversary never commits to a fixed permutation — it only promises to stay consistent with every answer it has given — and it steers the algorithm down a long path. This reproves $Ω (n log n)$ for sorting and, more sharply, pins exact constants the leaf bound cannot see. The classic result is that finding both the minimum and maximum of $n$ elements needs exactly $⌈ 3 n /2 ⌉ - 2$ comparisons: an adversary that tracks how many elements have lost and won so far forces any algorithm to spend that many and no fewer. Leaf-counting gives the right growth rate; the adversary gives the right constant.

Sorting networks and the 0-1 principle. A sorting network is an oblivious sort: a fixed sequence of compare-exchange operations on wire pairs, chosen in advance with no data-dependent branching, so the same comparisons run regardless of input. That obliviousness suits hardware and SIMD lanes. Proving a network correct on all $n!$ inputs directly is infeasible; the 0-1 principle reduces it: a comparison network sorts every input if and only if it sorts every input of $0$ s and $1$ s (Knuth). Since a comparator's behavior depends only on the relative order of its two inputs, correctness on the $2^{n}$ binary inputs — far fewer than $n!$ , and checkable one threshold at a time — implies correctness on all inputs. Batcher's bitonic network sorts in $Θ (n log^{2} n)$ comparisons across $Θ (log^{2} n)$ parallel depth, and the 0-1 principle is what makes verifying it tractable.

The 0-1 principle. A comparison network's behavior on any input is fixed by the relative order of the compared pairs, so if it sorts all

2^{n}

binary inputs (bottom, checkable one threshold at a time) it sorts all

n!

inputs (top).

Beyond comparisons: the algebraic model. The comparison model is not the only one with provable floors. In the algebraic decision tree model, where each node tests the sign of a polynomial in the inputs, one can prove that element distinctness — deciding whether $n$ reals are all different — needs $Ω (n log n)$ operations (Ben-Or, 1983). This matters because it transfers: many geometry problems, such as deciding whether any two of $n$ points coincide or computing a convex hull, contain element distinctness as a sub-question and so inherit the $Ω (n log n)$ bound. The lesson is that lower bounds are always relative to a model: enlarge the model (comparisons to indexing, as the next lesson does; or comparisons to algebraic tests) and the floor can move. What the decision-tree argument proves is not that sorting is hard in an absolute sense, but that it is hard for algorithms limited to the questions the model allows.⁶

Takeaways

A comparison sort learns only the outcomes of element comparisons; this restriction is what makes a lower bound provable.
Any comparison sort is a decision tree: internal nodes are comparisons, leaves are output orderings, and worst-case comparisons equal the tree's height.
Correctness forces $\geq n!$ leaves; a height- $h$ binary tree has $\leq 2^{h}$ leaves, so $h \geq log_{2} (n!)$ .
By Stirling, $log_{2} (n!) = n log_{2} n - Θ (n) = Ω (n log n)$ , so every comparison sort needs $Ω (n log n)$ comparisons in the worst case.
The floor survives both escape hatches: by Kraft + Jensen the average input also costs $\geq log_{2} (n!)$ , and a randomized sort is just a distribution over decision trees, so coin flips do not help either.
Mergesort and heapsort hit this bound and are thus asymptotically optimal; beating it requires stepping outside the comparison model.
The idea generalizes: adversary arguments pin exact constants (min-and-max in $⌈ 3 n /2 ⌉ - 2$ ), the 0-1 principle reduces verifying a sorting network to its $2^{n}$ binary inputs, and algebraic decision trees carry an $Ω (n log n)$ floor to element distinctness and the geometry problems built on it. Every lower bound is relative to its model.

CLRS, §8.1 — Lower Bounds for Sorting — any comparison sort requires $Ω (n log n)$ comparisons in the worst case. ↩
Erickson, Algorithms, Ch. — Lower Bounds via Decision Trees — modeling a comparison sort as a binary decision tree whose internal nodes are comparisons and leaves are output orderings. ↩
CLRS, §8.1 — Lower Bounds for Sorting — a correct sort's decision tree has at least $n!$ reachable leaves, one per permutation. ↩
CLRS, §8.1 — Lower Bounds for Sorting — via Stirling's approximation, $log_{2} (n!) = Ω (n log n)$ , bounding the tree height below. ↩
CLRS, Problem 8-1 — Probabilistic lower bounds on comparison sorting. The average-case bound over uniformly random permutations, and its extension to randomized comparison sorts as distributions over deterministic decision trees. ↩ ↩²
Erickson, Algorithms, Ch. — Lower Bounds via Decision Trees — the bound holds only for comparison sorts; algorithms reading digits or indices fall outside the model. ↩ ↩² ↩³