Sorting & Order Statistics/Sorting in Linear Time

Lesson 3.31,542 words

Sorting in Linear Time

The Ω(nlogn)\Omega(n\log n) barrier only binds algorithms that compare. By instead using keys as array indices we slip past it: counting sort runs in Θ(n+k)\Theta(n+k) and is stable, radix sort layers it digit by digit, and bucket sort averages Θ(n)\Theta(n) on uniform data.

╌╌╌╌

The previous lesson proved a wall: any sort that learns only by comparing elements needs comparisons. That proof has one weakness, and it is our opening here. It assumes the algorithm extracts information one comparison at a time. If instead we treat keys as data we can read, using a key directly as an array index or peeling it apart digit by digit, the decision-tree argument never gets off the ground, and we can sort in linear time.1 The price is a loss of generality: these algorithms need keys drawn from a small or structured universe, not arbitrary comparables.

Counting sort

Suppose every key is an integer in the range . Counting sort never compares two elements. Instead it counts, for each value , how many keys are ; that count is exactly the final position of the last key equal to . Reading the input back-to-front and decrementing as we place, we drop each element straight into its sorted slot.

Algorithm 1:Counting-Sort(A,B,k)\textsc{Counting-Sort}(A, B, k) — sort A[1..n]A[1..n] with keys in [0,k][0, k] into BB
  1. 1
    let C[0..k]C[0..k] be a new array
  2. 2
    for v0v \gets 0 to kk do
  3. 3
    C[v]0C[v] \gets 0
  4. 4
    for j1j \gets 1 to A.lengthA.length do
  5. 5
    C[A[j]]C[A[j]]+1C[A[j]] \gets C[A[j]] + 1
    C[v] = count of keys = v
  6. 6
    for v1v \gets 1 to kk do
  7. 7
    C[v]C[v]+C[v1]C[v] \gets C[v] + C[v - 1]
    C[v] = count of keys v\le v
  8. 8
    for jA.lengthj \gets A.length downto 11 do
  9. 9
    B[C[A[j]]]A[j]B[C[A[j]]] \gets A[j]
  10. 10
    C[A[j]]C[A[j]]1C[A[j]] \gets C[A[j]] - 1
    next equal key goes before it

The first count loop tallies occurrences; the prefix-sum loop turns counts into ranks (how many keys land at or before each value); the final loop scatters each element into its slot in the output array . Walking the input from down to is what makes the sort stable. Equal keys are emitted in their original relative order, because the last such key claims the highest of the slots reserved for that value, and earlier ones fill in below it.

Counting sort on : tally counts, prefix-sum into ranks, then scatter into the output .

Zooming in on the scatter loop makes the mechanism, and the stability, concrete. Reading from the back, the last key looks up its rank and drops straight into ; we then decrement to , so the next we meet (an earlier one in ) lands in , just before it, preserving input order.

One step of the scatter loop. The last key reads its rank , is written to , and then is decremented to so the previous falls just before it — the source of stability.

Analysis. The loops run , , , and times, so counting sort is in both time and space. As long as this is , genuinely linear, beating the comparison bound because no comparisons happen.2 The catch is the space and time in . If the keys range over, say, -bit integers, then dwarfs any realistic , the count array is enormous, and the method falls apart. Counting sort works best when the key universe is small.

Radix sort

What if the keys are larger, say -digit numbers, so that a single counting pass is infeasible? Radix sort decomposes each key into digits and sorts one digit at a time. The counterintuitive rule, known since the days of punched-card machines, is to sort by the least significant digit first (LSD), working up to the most significant.

Algorithm 2:Radix-Sort(A,d)\textsc{Radix-Sort}(A, d) — sort dd-digit keys, least significant digit first
  1. 1
    for i1i \gets 1 to dd do
  2. 2
    use a stable sort to sort AA on digit ii
    digit 1 = least sig.

The correctness rests entirely on stability. After sorting on digit , suppose the array is correctly ordered by the low-order digits . Now sort on digit with a stable sort. Two keys differing in digit are ordered correctly by that pass. Two keys agreeing in digit are left in their incoming order by stability, and that incoming order was already correct on digits . So after the pass the array is correctly ordered on digits . By induction, after all passes the array is fully sorted.3 Using an unstable per-digit sort would destroy the work of every earlier pass. This is why the inner sort must be stable, and counting sort is the natural choice.

LSD radix sort over three digits; each pass stably sorts on one digit, and ties keep the previous pass's order until the array is sorted.

Zooming in on a single pass shows why stability is not optional. Suppose the array is already ordered on the low digit, and we now sort on the next one. Keys that tie on the new digit must keep their incoming order, since that order already encodes the lower digit; only keys that differ on the new digit may be reordered.

Why stability is essential. Sorting on the tens digit, keys that tie there (both ) keep their incoming order, preserving the units sort; only keys that differ on the tens digit cross.

Analysis. With counting sort on each of digits, each drawn from a range of size , every pass costs , for a total of

When is a constant and , for example fixed-width integers split into a constant number of digits in a base of size , radix sort runs in . Choosing the digit size is an engineering tradeoff: larger digits mean fewer passes ( shrinks) but a larger per pass. For -bit keys, the best choice is typically digits of about bits, so and .

Bucket sort

Counting and radix sort exploit integer keys. Bucket sort instead exploits a distributional assumption: that the keys are drawn (roughly) uniformly at random from an interval, say . It scatters the keys into equal sub-intervals, the buckets, sorts each bucket with a simple sort like insertion sort, then concatenates the buckets in order.

Bucket sort on keys in . Key drops into bucket ; each bucket is sorted and the buckets concatenated left to right. Uniform keys give per bucket.
Algorithm 3:Bucket-Sort(A)\textsc{Bucket-Sort}(A) — sort nn keys drawn uniformly from [0,1)[0, 1)
  1. 1
    nA.lengthn \gets A.length
  2. 2
    let B[0..n1]B[0..n-1] be an array of empty lists
  3. 3
    for i1i \gets 1 to nn do
  4. 4
    insert A[i]A[i] into list B[nA[i]]B[\,\floor{n \cdot A[i]}\,]
    bucket by value
  5. 5
    for i0i \gets 0 to n1n - 1 do
  6. 6
    sort list B[i]B[i] with insertion sort
  7. 7
    concatenate B[0],B[1],,B[n1]B[0], B[1], \dots, B[n-1] in order

Scattering is , and concatenation is . The only variable cost is sorting the buckets. If the input is spread uniformly, each bucket holds about one element on average, so the insertion sorts cost each in expectation.4

Analysis. Let . Insertion sort on bucket costs , so the expected total bucket-sorting cost is . Each key lands in bucket independently with probability , so is Binomial, which has

Summing over the buckets gives , so the total expected running time is

This is an average-case result: it assumes the inputs are uniformly distributed. Adversarial input, with every key landing in the same bucket, degrades bucket sort to the of a single insertion sort. Bucket sort is the right tool when you know your data is spread evenly (or can cheaply map it so), as with fractional parts of well-mixed values.

Choosing among them

None of these linear-time sorts is a drop-in replacement for a comparison sort like mergesort or heapsort. Each rests on a structural assumption about the keys, so the choice comes down to matching the algorithm to what you know about your data.5

AlgorithmAssumption on keysTimeStable?Extra space
Counting sortintegers in a small range yes
Radix sort digits, each in a small rangeyes
Bucket sortreals spread uniformly over an interval expectedyes

Practical guidance:

  • Reach for counting sort when keys are integers over a range comparable to (grades, small ages, byte values). It is also the standard stable subsort inside radix sort.
  • Reach for radix sort for fixed-width keys with a larger range, such as - or -bit integers or fixed-length strings, where a single counting pass would need an impossibly large count array.
  • Reach for bucket sort when keys are real numbers you believe are uniformly (or near-uniformly) distributed, and you want linear expected time.

And remember the boundary: these methods beat precisely because they are not comparison sorts. They never ask is ?; they compute with the keys. Hand any of them arbitrary comparable objects with no exploitable integer or distributional structure, and the linear-time guarantee is gone, leaving you back at a comparison sort and its floor.

Takeaways

  • The bound binds only comparison sorts; using keys as array indices or digit sequences sidesteps it entirely.
  • Counting sort ranks keys by prefix-summing their counts: , stable, linear when but ruinous when is large.
  • Radix sort stably sorts digit by digit, least significant first; stability is what preserves earlier passes, giving .
  • Bucket sort scatters uniform keys into buckets and sorts each; expected , but if the distribution is adversarial.
  • Each linear sort trades generality for a structural assumption on the keys, so choose by what you actually know about your data.

Footnotes

  1. Erickson, Algorithms, Ch. — Sorting Beyond Comparisons — treating keys as readable data sidesteps the decision-tree argument and permits linear-time sorting.
  2. CLRS, §8.2 — Counting Sort — counting sort runs in , is stable, and is linear when .
  3. CLRS, §8.3 — Radix Sort — sorting least-significant digit first with a stable subsort yields a correct sort in .
  4. CLRS, §8.4 — Bucket Sort — scattering uniformly distributed keys into buckets gives expected running time.
  5. Skiena, The Algorithm Design Manual, §4 — Sorting and Searching — choosing the right sort by matching the algorithm to the structure of the keys.
Practice

╌╌ END ╌╌