Data Structures/Balanced Search Trees

Lesson 4.51,465 words

Balanced Search Trees

An ordinary BST can degrade to height Θ(n)\Theta(n); balanced search trees guarantee h=O(logn)h = O(\log n) by maintaining invariants and repairing them after every update. We meet rotations, the local restructuring primitive, then red-black trees, whose color invariants force logarithmic height, and finally B-trees, which trade tall-and-thin for short-and-wide to win on disk.

╌╌╌╌

The previous lesson left us with a sharp problem. A binary search tree does everything in time, which is glorious when the tree is balanced and catastrophic when it is not, and we cannot control the order in which keys arrive. A balanced search tree removes the dependence on luck. It maintains a structural invariant strong enough to force , and after every insertion or deletion it does a small amount of rebalancing to restore that invariant.1 The guarantee becomes worst-case, not best-case: per operation, on every input, forever.

Different balanced trees enforce different invariants. AVL trees bound the height difference of sibling subtrees, red-black trees color the nodes, and B-trees pack many keys per node, but they all share one mechanical primitive for reshaping the tree without disturbing its sorted order: the rotation.

Rotations: local surgery that preserves order

A rotation rearranges three pointers to change a tree's shape, and hence its height, while keeping the BST property intact. A right rotation at a node makes 's left child the new subtree root, with becoming 's right child; a left rotation is the exact inverse. The subtree that was in the middle (call it ) switches parents but stays between and in key order.

Left and right rotations reshape a BST while preserving sorted order

In both trees the sorted order is — a rotation never violates the BST property.2 What it does change is height: rotating can pull a deep subtree up a level and push a shallow one down, which is precisely the lever a balanced tree pulls to repair itself.

Algorithm:Left-Rotate(T,x)\textsc{Left-Rotate}(T, x) — pivot xx down-left, its right child yy up
  1. 1
    yright(x)y \gets right(x)
  2. 2
    right(x)left(y)right(x) \gets left(y)
    T2T_2 becomes x's right child
  3. 3
    if left(y)nilleft(y) \ne \text{nil} then
  4. 4
    parent(left(y))xparent(left(y)) \gets x
  5. 5
    parent(y)parent(x)parent(y) \gets parent(x)
    splice y where x was
  6. 6
    if parent(x)=nilparent(x) = \text{nil} then
  7. 7
    root(T)yroot(T) \gets y
  8. 8
    else if x=left(parent(x))x = left(parent(x)) then
  9. 9
    left(parent(x))yleft(parent(x)) \gets y
  10. 10
    else
  11. 11
    right(parent(x))yright(parent(x)) \gets y
  12. 12
    left(y)xleft(y) \gets x
    x hangs under y
  13. 13
    parent(x)yparent(x) \gets y

A rotation touches only a constant number of pointers, so it runs in time. Every balanced-tree rebalancing operation is built from a handful of rotations (plus, for red-black trees, recolorings) along a single root-to-leaf path, hence work to repair the tree after an update.

Red-black trees: balance by color

A red-black tree is a BST in which every node carries one extra bit, its color — red or black.3 Five invariants on those colors squeeze the tree into logarithmic height:

  1. Every node is either red or black.
  2. The root is black.
  3. Every leaf (the nil sentinel) is black.
  4. If a node is red, then both its children are black (no two reds in a row).
  5. For each node, every path from it down to a descendant leaf contains the same number of black nodes — its black-height.

Properties 4 and 5 are the engine. Property 5 says all root-to-leaf paths have the same number of black nodes; property 4 says reds cannot cluster, so reds can at most double a path's length by interleaving. Together they bound how lopsided the tree can get.

Why the height is

The argument is short and worth seeing, because it explains why the color rules take exactly this form. Let be the black-height of node (the number of black nodes on any path from down to a leaf, not counting ).

Now let be the tree's height. By property 4 at least half the nodes on any root-to-leaf path are black, so the root's black-height is at least . Applying the lemma at the root with internal nodes,

So every operation that walks the tree (search, insert, delete, successor) runs in worst-case time. After an insertion or deletion, the new node may violate property 4 (a red child of a red parent) or property 2/5; the tree repairs itself by recoloring and rotating up the path from the change toward the root. Each fix-up is work, the path has length , and the loop terminates either by pushing the violation upward until it vanishes or by a final rotation. We will not work through every case of insert-fixup here; the point is the shape of the guarantee: bounded-height invariant plus rotation-and-recolor repair equals worst-case logarithmic operations.

The most common fix-up is the easy one: when the newly inserted red node has a red parent and a red uncle , we simply recolor — paint the parent and uncle black and the grandparent red — with no rotation at all. That repaints the local subtree to restore no two reds while preserving every path's black-height, but the grandparent is now red, so the violation may reappear one level up; we move to the grandparent and continue. Only when the uncle is black do we resort to one or two rotations, after which the loop ends.

Insert-fixup Case 1 — a red uncle lets us recolor instead of rotate, pushing the violation up to grandparent

The payoff is starkest on the worst possible input for a plain BST: keys arriving already sorted. A naive tree threads them into a single descending path of height ; the red-black invariants instead rebalance on the fly into a bushy tree of height , with every root-to-leaf path carrying the same black-height.

The same keys inserted in order: a plain BST degenerates to height , a red-black tree stays at height with equal black-heights

AVL trees reach the same guarantee with a different invariant, namely that the heights of any node's two subtrees differ by at most , enforced by rotations. AVL trees are more rigidly balanced (shorter, faster lookups) but do more rotations on updates; red-black trees are looser and rebalance more cheaply, which is why they back many standard-library ordered maps.4

B-trees: balance for the disk

Red-black and AVL trees minimize the number of nodes on a root-to-leaf path. But when the data lives on a disk or SSD, the cost that dominates is not comparisons but block transfers: reading one page from disk is millions of times slower than a memory comparison. A binary tree of keys has height , meaning up to disk reads per search. That is the problem solve.

A of minimum degree is a balanced search tree that is short and wide: each internal node holds between and keys (kept sorted within the node) and correspondingly between and children.5 Its defining structural rules:

  • Every node holds between and keys (the root may hold as few as ); a node with keys has exactly children.
  • The keys in a node partition the key space, just as in a BST: child holds all keys between key and key of the node.
  • All leaves are at the same depth, so the tree is perfectly height-balanced by construction.
A B-tree packing several sorted keys per node with all leaves at equal depth

Each node is sized to fill one disk block, so a single read brings in a whole fan-out's worth of keys. Because every node has up to children, a B-tree holding keys has height

The fan-out sits in the denominator: making nodes wide makes the tree shallow. With on the order of a thousand keys per block, a tree of a billion keys has height or , meaning two or three disk reads per search instead of thirty. Insertions and deletions keep all leaves at the same depth by splitting a full node (one with keys) into two and pushing its median key up to the parent, or merging an underfull node with a sibling: local restructuring that ripples at most up one root-to-leaf path.

Splitting a full node (, so keys). The full child overflows, so its median key moves up into the parent and the remaining keys split into two nodes of keys each. The tree grows wider, never taller (unless the root itself splits), keeping all leaves at equal depth.

B-trees (and their variant the B+-tree, which keeps all data in the leaves and links them for range scans) are the index structure databases and filesystems rely on for exactly this reason.

Takeaways

  • A balanced search tree maintains a structural invariant that forces height , repairing it after each update so every operation is worst-case , not merely best-case.
  • Rotations are the primitive that reshapes a tree to change its height while preserving the BST property and sorted order.
  • Red-black trees color nodes and enforce no-two-reds plus equal black-height; a counting argument shows , and fix-ups are rotations/recolorings along one path.
  • AVL trees reach the same bound with a stricter height-difference invariant: shorter trees, more rotations.
  • trade tall-and-thin for short-and-wide, packing keys per node so height is , minimizing disk block transfers, the dominant cost for on-disk data.

Footnotes

  1. Erickson, Ch. — Balanced Binary Search Trees: invariant-plus-rebalancing forces worst-case height.
  2. CLRS, Ch. 13 — Red-Black Trees (§13.2): rotations as the restructuring primitive that preserves the BST property.
  3. CLRS, Ch. 13 — Red-Black Trees (§13.1): the color invariants that bound height to .
  4. Skiena, §3.4 — Balanced Search Trees: red-black trees as the practical balanced BST behind library ordered maps.
  5. CLRS, Ch. 18 — B-Trees (§18.1): the minimum-degree structure with to keys per node, minimizing disk transfers.
Practice

╌╌ END ╌╌