Quicksort · study.

Mergesort does its hard work in the combine step: splitting is trivial, merging is where the sorting happens. $Quicksort$ flips this around. It does its hard work in the divide step, partitioning the array so that everything small comes before everything large, after which the combine step is empty. Sort the two parts in place and the whole array is sorted, with no merging required.¹

The paradigm applied

To sort $A [p .. r]$ :

Divide. Choose a pivot element and partition $A [p .. r]$ into two regions: a left part whose elements are all $\leq$ the pivot, and a right part whose elements are all $\geq$ the pivot, with the pivot itself in between at some index $q$ .
Conquer. Recursively sort $A [p .. q - 1]$ and $A [q + 1.. r]$ .
Combine. Nothing to do: the subarrays are already in place and in order relative to each other.

Algorithm 1:

\textsc{Quicksort}(A, p, r)

— sort

A[p..r]

in place

1
if $p < r$ then
2
$q \gets$ call $\textsc{Partition}(A, p, r)$
pivot at final index q
3
call $\textsc{Quicksort}(A, p, q - 1)$
everything $\le$ pivot
4
call $\textsc{Quicksort}(A, q + 1, r)$
everything $\ge$ pivot

Everything hinges on $Partition$ .

Partitioning around a pivot

Partition rearranges an array around a pivot $x$ so that it falls into three contiguous regions: everything less than the pivot, then the pivot itself sitting at its final sorted index $q$ , then everything greater. Sorting $A [p .. r]$ around a chosen pivot value $x$ rearranges it into

Partition splits the array into elements less than the pivot, the pivot at index

q

, then greater elements.

No element to the left of $q$ exceeds the pivot, and none to the right is smaller, so the pivot is already in its final position. The two recursive sorts never have to look across the boundary at $q$ , which is why the combine step vanishes. All of quicksort's work is in making this split, and making it balanced.

Lomuto partition

The simplest scheme, due to Lomuto, takes the last element $A [r]$ as the pivot and sweeps an index $j$ across the array, maintaining a boundary $i$ between the elements known to be $\leq$ the pivot and those known to be $>$ it.²

Algorithm 2:

\textsc{Partition}(A, p, r)

— Lomuto scheme, pivot

= A[r]

1
$x \gets A[r]$
the pivot
2
$i \gets p - 1$
end of $\le x$ region
3
for $j \gets p$ to $r - 1$ do
4
if $A[j] \le x$ then
5
$i \gets i + 1$
6
exchange $A[i]$ with $A[j]$
7
exchange $A[i + 1]$ with $A[r]$
pivot into its slot
8
return $i + 1$

Correctness of partition

Partition is correct by a four-region loop invariant.

Partition does $Θ (n)$ comparisons on $n = r - p + 1$ elements.

A snapshot of the sweep makes the four regions concrete. On $A = ⟨ 2, 8, 7, 1, 3, 5, 6, 4 ⟩$ with pivot $x = A [r] = 4$ , just after the scan reaches $j = 5$ the boundary $i$ has collected the small elements on the left, the large ones trail behind, and the rest is still unexamined:

Lomuto sweep on

A = ⟨ 2, 8, 7, 1, 3, 5, 6, 4 ⟩

j = 5

: the

\leq x

region (boundary

i

) precedes the

> x

region, with pivot

x = 4

parked at the end.

The full sweep is worth tracing once, end to end. Each row shows the state at the top of the for loop for one value of $j$ ; swaps that move an element are marked. Recall $x = 4$ and $i$ starts at $p - 1 = 0$ .

$j$	test $A [j] \leq 4$	action	array after	$i$
$1$	$2 \leq 4$ : yes	$i \leftarrow 1$ ; swap $A [1] \leftrightarrow A [1]$ (no-op)	$⟨ 2, 8, 7, 1, 3, 5, 6, 4 ⟩$	$1$
$2$	$8 \leq 4$ : no	—	$⟨ 2, 8, 7, 1, 3, 5, 6, 4 ⟩$	$1$
$3$	$7 \leq 4$ : no	—	$⟨ 2, 8, 7, 1, 3, 5, 6, 4 ⟩$	$1$
$4$	$1 \leq 4$ : yes	$i \leftarrow 2$ ; swap $A [2] \leftrightarrow A [4]$	$⟨ 2, 1, 7, 8, 3, 5, 6, 4 ⟩$	$2$
$5$	$3 \leq 4$ : yes	$i \leftarrow 3$ ; swap $A [3] \leftrightarrow A [5]$	$⟨ 2, 1, 3, 8, 7, 5, 6, 4 ⟩$	$3$
$6$	$5 \leq 4$ : no	—	$⟨ 2, 1, 3, 8, 7, 5, 6, 4 ⟩$	$3$
$7$	$6 \leq 4$ : no	—	$⟨ 2, 1, 3, 8, 7, 5, 6, 4 ⟩$	$3$
—	loop done	swap $A [i + 1] = A [4] \leftrightarrow A [8]$	$⟨ 2, 1, 3, 4, 7, 5, 6, 8 ⟩$	—

$Partition$ returns $q = 4$ : the pivot $4$ sits at its final sorted index with ${2, 1, 3}$ to its left and ${7, 5, 6, 8}$ to its right — neither side sorted yet, but every element on the correct side of the boundary. The recursion takes it from there. Notice how a yes row swaps the scanned small element with the first large element (the one just past $i$ ), leapfrogging the large region one slot to the right.

quicksort_lomuto.pypython

from typing import TypeVar

from comparable import Comparable

Element = TypeVar("Element", bound=Comparable)

def lomuto_partition(values: list[Element], low: int, high: int) -> int:
  """
    Rearrange `values[low..high]` around the pivot `values[high]` so that\n
    everything left of the returned index is <= the pivot and everything\n
    right of it is > the pivot. Returns the pivot's final index.\n
  """
  pivot: Element = values[high]

  # boundary trails the region known to be <= pivot, initially empty.
  boundary: int = low - 1
  for scan in range(low, high):
    # `<` instead of `<=` so the element type only needs `__lt__`.
    if not (pivot < values[scan]):
      boundary += 1
      values[boundary], values[scan] = values[scan], values[boundary]

  # drop the pivot just past the <= region, into its final slot.
  pivot_index: int = boundary + 1
  values[pivot_index], values[high] = values[high], values[pivot_index]
  return pivot_index

def quicksort_lomuto(values: list[Element]) -> list[Element]:
  """
    Sort `values` in place and return it, using Lomuto partitioning.\n
  """

  def sort_range(low: int, high: int) -> None:
    # partition, then recurse on each side of the pivot's final slot.
    if low < high:
      pivot_index: int = lomuto_partition(values, low, high)
      sort_range(low, pivot_index - 1)
      sort_range(pivot_index + 1, high)

  sort_range(0, len(values) - 1)
  return values

comparable.pypython

from typing import Any, Protocol, TypeVar


class Comparable(Protocol):
  """
    Anything orderable with `<` (int, float, str, tuple, date, …).\n
  """

  # `other` is position-only so built-ins (int, str, …), whose dunder
  # operands are position-only, structurally satisfy the protocol.
  def __lt__(self, other: Any, /) -> bool: ...
  def __gt__(self, other: Any, /) -> bool: ...
  def __le__(self, other: Any, /) -> bool: ...
  def __ge__(self, other: Any, /) -> bool: ...

Hoare partition

Hoare's original scheme uses two indices that march toward each other from the ends, swapping out-of-place pairs as they meet. It does fewer swaps on average than Lomuto and handles arrays with many duplicate keys more gracefully, at the cost of a subtler invariant (the returned index splits the array but is not necessarily the pivot's final position).

The two pointers $i$ and $j$ start outside the array and walk inward: $i$ stops at the first element $\geq x$ that does not belong on the left, $j$ at the first $\leq x$ that does not belong on the right, and the pair is swapped. When the pointers cross, $j$ marks the boundary.

Hoare partition with pivot

x = A [p]

i

scans right past small elements,

j

scans left past large ones, and the out-of-order pair

A [i], A [j]

is swapped before both resume marching inward.

Algorithm 3:

\textsc{Hoare-Partition}(A, p, r)

— pivot

= A[p]

1
$x \gets A[p]$
the pivot
2
$i \gets p - 1$
3
$j \gets r + 1$
4
repeat
5
repeat $j \gets j - 1$ until $A[j] \le x$
scan down from right
6
repeat $i \gets i + 1$ until $A[i] \ge x$
scan up from left
7
if $i < j$ then
8
exchange $A[i]$ with $A[j]$
9
else
10
return $j$
split point: $A[p..j] \le A[j+1..r]$

With Hoare partition the recursive calls become $Quicksort (A, p, j)$ and $Quicksort (A, j + 1, r)$ , since $j$ is a boundary rather than the pivot's final index. Either scheme yields a correct, in-place quicksort; the difference lies purely in the constants and in robustness to duplicates.

quicksort_hoare.pypython

from typing import TypeVar

from comparable import Comparable

Element = TypeVar("Element", bound=Comparable)

def hoare_partition(values: list[Element], low: int, high: int) -> int:
  """
    Partition `values[low..high]` around the pivot `values[low]` with two\n
    converging indices. Returns a boundary index `split` such that every\n
    element in `values[low..split]` is <= every element in\n
    `values[split+1..high]`. The pivot is not necessarily at `split`.\n
  """
  # indices start just outside each end and march toward each other.
  pivot: Element = values[low]
  left: int = low - 1
  right: int = high + 1

  while True:
    # walk inward from the right past elements that belong on the right.
    right -= 1
    while values[right] > pivot:
      right -= 1

    # walk inward from the left past elements that belong on the left.
    left += 1
    while values[left] < pivot:
      left += 1

    # still crossed: swap the misplaced pair; otherwise the split is found.
    if left < right:
      values[left], values[right] = values[right], values[left]
    else:
      return right

def quicksort_hoare(values: list[Element]) -> list[Element]:
  """
    Sort `values` in place and return it, using Hoare partitioning.\n
  """

  def sort_range(low: int, high: int) -> None:
    if low < high:
      split: int = hoare_partition(values, low, high)

      # the boundary belongs to the left side, so recurse on `low..split`.
      sort_range(low, split)
      sort_range(split + 1, high)

  sort_range(0, len(values) - 1)
  return values

comparable.pypython

from typing import Any, Protocol, TypeVar


class Comparable(Protocol):
  """
    Anything orderable with `<` (int, float, str, tuple, date, …).\n
  """

  # `other` is position-only so built-ins (int, str, …), whose dunder
  # operands are position-only, structurally satisfy the protocol.
  def __lt__(self, other: Any, /) -> bool: ...
  def __gt__(self, other: Any, /) -> bool: ...
  def __le__(self, other: Any, /) -> bool: ...
  def __ge__(self, other: Any, /) -> bool: ...

Duplicate keys

The schemes differ most sharply on arrays with many equal elements — a common case in practice (sorting by year, by category, by grade). Take the extreme: an array of $n$ identical keys, which is of course already sorted.

With Lomuto, every test $A [j] \leq x$ succeeds, so $i$ marches in lockstep with $j$ and each swap is a self-swap. The loop ends with $i = r - 1$ ; the pivot moves to index $r$ , and the split is $n - 1$ elements versus $0$ . Every level of recursion peels off one element: $Θ (n^{2})$ on a fully sorted, fully equal input, even though no element ever needs to move.

With Hoare, both inner scans stop at elements equal to the pivot: $j$ stops at the first $A [j] \leq x$ and $i$ at the first $A [i] \geq x$ , which on an all-equal array is one step each. The two indices walk toward each other one position per round, exchanging equal elements pointlessly but crossing near the middle. The split is balanced, and the sort finishes in $Θ (n log n)$ . Those seemingly wasteful swaps of equal keys are what keep the split balanced.

An all-equal array. Lomuto classifies every element as

\leq x

and splits

(n - 1, 0)

— quadratic. Hoare's converging scans swap equal pairs and cross in the middle, splitting evenly.

The general fix is a three-way partition into $< x ∣= x ∣> x$ : group every key equal to the pivot into a middle block and recurse only on the strict sides. Duplicate-heavy inputs then get faster, not slower — an array of $k$ distinct keys sorts in $O (nk)$ partitioning work, and all-equal input becomes $O (n)$ . The Sort Colors practice problem below applies this partition at $k = 3$ .

Worst case versus best case

Partition is always $Θ (n)$ , so quicksort's total cost is governed entirely by how balanced the splits are.

Worst case. Suppose every partition is maximally lopsided, with one side empty and the other holding $n - 1$ elements. This happens, for Lomuto with a last-element pivot, on an array that is already sorted (or reverse sorted). The recurrence is

T (n) = T (n - 1) + T (0) + Θ (n) = T (n - 1) + Θ (n) = Θ (n^{2}) .

The recursion tree degenerates into a path of depth $n$ , with $Θ (n)$ work at the top shrinking by one at each level: $n + (n - 1) + \dots + 1 = Θ (n^{2})$ . Quicksort's worst case is no better than insertion sort.

The reason a sorted array is the worst case for Lomuto is simple: the last-element pivot $A [r]$ is then the largest value, so the whole scan stays $\leq x$ and the partition peels off just the pivot, leaving an empty right side and an $(n - 1)$ -element left side to do it all again.

On a sorted array Lomuto's pivot

A [r]

is the maximum, so every element is

\leq x

: partition strips off one element and recurses on the other

n - 1

, the maximally lopsided split.

Best case. If every partition splits evenly, the recurrence is the mergesort recurrence,

T (n) = 2 T (n /2) + Θ (n) = Θ (n log n) .

Near-balance suffices. Balance need not be perfect. Even a fixed $9$ -to- $1$ split gives

T (n) = T (9 n /10) + T (n /10) + Θ (n) = Θ (n log n),

because the recursion tree still has only $Θ (log n)$ levels (the longest root-to-leaf path shrinks by a factor of $10/9$ each step) and each level does $O (n)$ work. Any split by a constant fraction yields $Θ (n log n)$ . Only splits that are lopsided by a constant number of elements, like the worst case, push us to quadratic.

Balanced splits keep the tree

Θ (log n)

deep for a total of

Θ (n log n)

; a constant-size lopsided split stretches it into a depth-

n

path costing

Θ (n^{2})

Here is the $9$ -to- $1$ tree in more detail. Every level still sums to at most $c n$ , because the children of any node partition (at most) that node's elements. What changes is the depth: the left spine dies out after $log_{10} n$ levels, the right spine survives for $log_{10/9} n \approx 6.6 log_{2} n$ levels, and everything in between falls somewhere in the middle. A constant multiple of $log n$ levels at $\leq c n$ apiece is still $O (n log n)$ — lopsidedness by a constant fraction only bloats the constant.

The

9

-to-

1

recursion tree: each level sums to at most

c n

, and even the long right spine dies after

log_{10/9} n = Θ (log n)

levels, so the total stays

O (n log n)

The shrinking-recurrence lemma

This constant fraction is enough intuition can be stated as a lemma that we will reuse for linear-time selection. It says that as long as the recursive subproblems together are a constant fraction smaller than the original, linear work at each level collapses to linear work overall.

The proof is a tidy induction that pins the hidden constant exactly.

For balanced quicksort the per-level work grows a logarithmic number of times rather than collapsing, since the splits sum to the whole array ( $λ + μ = 1$ , the boundary case the lemma deliberately excludes), which is why quicksort is $Θ (n log n)$ and not $Θ (n)$ . The lemma's strict inequality $λ + μ < 1$ is what separates recurse on both halves (sorting) from throw away a constant fraction and recurse on one piece (selection).

shrinking_recurrence.pypython

from functools import lru_cache

def linear_bound_constant(
  shrink_left: float,
  shrink_right: float,
  linear_coefficient: float,
  base_cost: float,
) -> float:
  """
    The constant `a` from the lemma's proof, the slope of the ceiling\n
    T(n) <= a*n: a = max(base_cost, base/(1 - shrink_left - shrink_right)).\n
    Requires shrink_left + shrink_right < 1, where the shrinkage is spent.\n
  """
  # the lemma only bounds when the shrinkage leaves room to spend.
  total_shrink: float = shrink_left + shrink_right
  if total_shrink >= 1.0:
    raise ValueError("lemma requires shrink_left + shrink_right < 1")

  return max(base_cost, linear_coefficient / (1.0 - total_shrink))

def recurrence_cost(
  size: int,
  shrink_left: float,
  shrink_right: float,
  linear_coefficient: float,
  base_cost: float,
  base_threshold: int = 1,
) -> float:
  """
    Evaluate T(size) exactly for\n
      T(n) = T(floor(shrink_left*n)) + T(floor(shrink_right*n)) + coeff*n,\n
    with T(n) = base_cost for n <= base_threshold. Sizes shrink by a floor at\n
    each step, so the recursion terminates and memoizes cleanly.\n
  """

  @lru_cache(maxsize=None)
  def solve(current: int) -> float:
    # small sizes bottom out at the flat base cost.
    if current <= base_threshold:
      return base_cost

    # each child shrinks by a floor, so the recursion always terminates.
    left_size: int = int(shrink_left * current)
    right_size: int = int(shrink_right * current)
    return solve(left_size) + solve(right_size) + linear_coefficient * current

  return solve(size)

Why randomization helps

The danger is a pivot rule that an adversary, or merely unlucky real-world data, can drive into the worst case. The fix is to randomize: choose the pivot uniformly at random from $A [p .. r]$ (equivalently, swap a random element to the end before running Lomuto partition).

Algorithm 4:

\textsc{Randomized-Partition}(A, p, r)

1
$k \gets$ a uniformly random integer in $[p, r]$
2
exchange $A[k]$ with $A[r]$
randomize pivot, reuse Lomuto
3
return call $\textsc{Partition}(A, p, r)$

Now no particular input is bad: the coin flips, not the input order, decide the split. The worst case still exists in principle (every random choice could be unlucky), but its probability is vanishingly small, and we can prove the expected running time is $Θ (n log n)$ on every input.³

randomized_quicksort.pypython

import os
import random
import sys
from typing import Optional, TypeVar

sys.path.append(
  os.path.join(os.path.dirname(__file__), "..", "..", "data-structures")
)

from comparable import Comparable

Element = TypeVar("Element", bound=Comparable)

def _lomuto_partition(values: list[Element], low: int, high: int) -> int:
  """
    The Lomuto sweep around pivot `values[high]`, returning its final index.\n
  """
  # boundary trails the region known to be <= pivot, initially empty.
  pivot: Element = values[high]
  boundary: int = low - 1

  # sweep the slice, pulling each <= element into the boundary region.
  for scan in range(low, high):
    if not pivot < values[scan]:
      boundary += 1
      values[boundary], values[scan] = values[scan], values[boundary]

  # drop the pivot just past the <= region, into its final slot.
  pivot_index: int = boundary + 1
  values[pivot_index], values[high] = values[high], values[pivot_index]
  return pivot_index

def randomized_partition(
  values: list[Element],
  low: int,
  high: int,
  rng: random.Random,
) -> int:
  """
    Swap a uniformly random element of `values[low..high]` to the end, then\n
    run the Lomuto partition. Returns the chosen pivot's final index.\n
  """
  # move a random element to the end, then partition around it.
  chosen: int = rng.randint(low, high)
  values[chosen], values[high] = values[high], values[chosen]
  return _lomuto_partition(values, low, high)

def randomized_quicksort(
  values: list[Element],
  rng: Optional[random.Random] = None,
) -> list[Element]:
  """
    Sort `values` in place and return it, picking each pivot at random.\n
    Pass an `rng` (a seeded `random.Random`) for reproducible runs.\n
  """
  # default to a fresh source of randomness when no rng is supplied.
  generator: random.Random = rng if rng is not None else random.Random()

  def sort_range(low: int, high: int) -> None:
    # partition, then recurse on each side of the pivot's final slot.
    if low < high:
      pivot_index: int = randomized_partition(values, low, high, generator)
      sort_range(low, pivot_index - 1)
      sort_range(pivot_index + 1, high)

  sort_range(0, len(values) - 1)
  return values

comparable.pypython

from typing import Any, Protocol, TypeVar


class Comparable(Protocol):
  """
    Anything orderable with `<` (int, float, str, tuple, date, …).\n
  """

  # `other` is position-only so built-ins (int, str, …), whose dunder
  # operands are position-only, structurally satisfy the protocol.
  def __lt__(self, other: Any, /) -> bool: ...
  def __gt__(self, other: Any, /) -> bool: ...
  def __le__(self, other: Any, /) -> bool: ...
  def __ge__(self, other: Any, /) -> bool: ...

The expected-comparisons argument

Let the sorted order of the elements be $z_{1} < z_{2} < \dots < z_{n}$ , and let $Z_{ij} = {z_{i}, \dots, z_{j}}$ . Two elements are compared at most once over the whole run, since comparisons only ever happen against a pivot, and a pivot is removed from future partitions. Define the indicator $X_{ij} = 1 [z_{i} is compared with z_{j}]$ . The total comparison count is $X = \sum_{i < j} X_{ij}$ , so by linearity of expectation

E [X] = i = 1 \sum n - 1 j = i + 1 \sum n Pr [z_{i} compared with z_{j}] .

The combinatorial fact that matters: $z_{i}$ and $z_{j}$ are compared iff the first pivot chosen from the range $Z_{ij}$ is either $z_{i}$ or $z_{j}$ . If instead some middle element $z_{m}$ (with $i < m < j$ ) is picked first, it splits $z_{i}$ and $z_{j}$ into different subarrays and they never meet.

Endpoints

z_{i}, z_{j}

are compared only when the first pivot drawn from

Z_{ij}

is one of them; any middle pivot

z_{m}

separates them forever.

Since the first pivot drawn from the $j - i + 1$ elements of $Z_{ij}$ is equally likely to be any of them,

Pr [z_{i} compared with z_{j}] = \frac{2}{j - i + 1} .

Substituting and reindexing with $k = j - i$ ,

E [X] = i = 1 \sum n - 1 j = i + 1 \sum n \frac{2}{j - i + 1} < i = 1 \sum n - 1 k = 1 \sum n \frac{2}{k} = 2 i = 1 \sum n - 1 H_{n} = O (n log n),

using the harmonic-number bound $H_{n} = \sum_{k = 1}^{n} 1/ k = Θ (log n)$ . So randomized quicksort makes $Θ (n log n)$ comparisons in expectation, regardless of the input arrangement.

Two sanity checks on the formula $2/ (j - i + 1)$ . Adjacent elements in sorted order ( $j = i + 1$ ) are compared with probability $1$ — and indeed they must be: no third element can separate them, and a comparison sort that never compares them cannot know their order. Meanwhile the minimum and maximum ( $i = 1$ , $j = n$ ) are compared with probability $2/ n$ : almost any first pivot splits them apart immediately.

Engineering the recursion

A textbook quicksort recurses to $n = 1$ and uses whatever pivot rule it was given. Production quicksorts make three standard adjustments.

Pivot selection. Median-of-three pivots on the median of the first, middle, and last elements. It makes sorted and reverse-sorted inputs split perfectly instead of catastrophically, and it halves the chance of a bad split on random data. It is a heuristic, not a guarantee — fixed rules always leave some adversarial ordering quadratic, which is why libraries either randomize or monitor the recursion depth.

Cutoff to insertion sort. As with mergesort, recursing to singletons drowns small subarrays in call overhead. Below a threshold of $\approx 10$ elements, stop; either run insertion sort on each little piece, or (a classic trick) leave the pieces unsorted and finish with one insertion-sort pass over the whole array, which is linear because every element is already within a constant distance of its final position.

Bounded stack. Worst-case inputs threaten not just $Θ (n^{2})$ time but $Θ (n)$ recursion depth — a stack overflow, not merely a slowdown. The fix is to recurse only into the smaller side and loop on the larger one (tail-call elimination by hand). The recursive subproblem is then at most half its parent, so the stack never exceeds $log_{2} n$ frames, even when the running time degenerates.

Quicksort versus mergesort

	Quicksort	Mergesort
Worst case	$Θ (n^{2})$	$Θ (n log n)$
Expected / average	$Θ (n log n)$	$Θ (n log n)$
Extra space	$Θ (log n)$ (stack)	$Θ (n)$
In place	yes	no
Stable	no	yes
Constants	small (cache-friendly)	larger

In practice quicksort is usually the fastest comparison sort on arrays in memory: it works in place, has tight inner loops, and accesses memory sequentially, a cache-friendly pattern.⁴ Its weaknesses are the $Θ (n^{2})$ worst case (tamed by randomization or median-of-three pivoting) and instability. Mergesort wins when you need a worst-case guarantee, stability, or are sorting linked lists or data too large for memory. A common engineering compromise, introsort, runs quicksort but switches to heapsort once the recursion depth exceeds $Θ (log n)$ , capturing quicksort's speed with a worst-case $Θ (n log n)$ ceiling.

What standard libraries ship

Introsort was the 1997 answer; the sorts shipping in today's standard libraries have moved past it in two directions.

Pattern-defeating quicksort (pdqsort). The heuristics of the previous section each leave some input slow. pdqsort (Peters, 2016), now the unstable sort_unstable in Rust and the basis of libc++'s std::sort, hardens them into guarantees. It keeps introsort's heapsort fallback for the worst case, but adds two adaptive tricks: it detects already-sorted or reverse-sorted runs and short-circuits them toward linear time, and — the pattern-defeating part — when it notices a partition was badly unbalanced (a sign an adversary or bad pattern is at work) it injects randomness into pivot choice for that subtree, so no fixed input pattern stays quadratic. It captures median-of-three's speed on ordinary data, adaptivity on structured data, and a hard $Θ (n log n)$ ceiling, all at once.

Dual-pivot partitioning. Java's Arrays.sort for primitives uses a dual-pivot quicksort (Yaroslavskiy, 2009): pick two pivots $p \leq q$ and partition into three regions — $< p$ , between $p$ and $q$ , and $> q$ — in a single sweep. It does more comparisons per element than the classic scheme but noticeably fewer cache misses and data movements, and on modern memory hierarchies that trade wins. It is a reminder that the comparison count, the quantity this lesson's analysis minimizes, is no longer the whole cost on real hardware.

Fighting branch misprediction: BlockQuicksort. On a modern CPU the hidden cost of partitioning is the unpredictable branch if A[j] <= x: half the time it mispredicts, flushing the pipeline. BlockQuicksort (Edelkamp and Weiß, 2016) removes the branch by computing, for a block of elements at a time, an array of indices that need swapping and then swapping them with straight-line, branchless code. The comparison count is unchanged, but eliminating the mispredictions makes it substantially faster in wall-clock time — another case where the theoretical model and the machine disagree, and the engineering follows the machine.

Takeaways

$Quicksort$ front-loads the work into $Partition$ ; once the array is partitioned around a pivot $x$ into $< x ∣ x ∣> x$ , the pivot is in its final slot and the recursive sorts need no combine step.
$Lomuto$ (single sweep, pivot at the end) and $Hoare$ (two converging indices) are both correct via partition loop invariants; Hoare does fewer swaps and handles duplicates better.
Cost is set by split balance: lopsided-by-a-constant gives $Θ (n^{2})$ , but any constant-fraction split gives $Θ (n log n)$ . The shrinking-recurrence lemma ( $λ + μ < 1 \Rightarrow T (n) = O (n)$ ) makes constant fraction is enough rigorous and powers linear-time selection next door.
Randomizing the pivot makes the expected cost $Θ (n log n)$ on every input; the proof counts each pair's $2/ (j - i + 1)$ chance of being compared.
Duplicates expose the schemes' difference: all-equal input drives Lomuto quadratic while Hoare stays balanced; a three-way $< x ∣= x ∣> x$ partition makes duplicate-heavy inputs faster, not slower.
Production quicksorts add median-of-three pivoting, an insertion-sort cutoff for small subarrays, and smaller-side-first recursion to cap the stack at $O (log n)$ .
Quicksort is typically the fastest in-memory sort; mergesort wins on worst-case guarantees, stability, and external data.

Erickson, Algorithms, Ch. 1 — Recursion: quicksort as divide-and-conquer that front-loads the work into partitioning, leaving an empty combine step. ↩
CLRS, Ch. 7 — Quicksort: the Lomuto single-sweep partition scheme and its four-region loop invariant. ↩
CLRS, Ch. 7 — Quicksort: randomized quicksort and the proof that its expected comparison count is $Θ (n log n)$ on every input. ↩
Skiena, The Algorithm Design Manual, §4 — Sorting and Searching: quicksort as the fastest in-memory comparison sort in practice, and the role of pivot selection. ↩

The paradigm applied

Partitioning around a pivot

Lomuto partition

Correctness of partition

Hoare partition

Duplicate keys

Worst case versus best case

The shrinking-recurrence lemma

Why randomization helps

The expected-comparisons argument

Engineering the recursion

Quicksort versus mergesort

What standard libraries ship

Takeaways

Footnotes