Data Structures/Binary Search Trees

Lesson 4.31,586 words

Binary Search Trees

A binary search tree keeps keys ordered so that every operation follows a single root-to-leaf path. We state the BST property, give search and insert, find minimum, maximum, and successor, see that an inorder walk emits the keys in sorted order, and confront the catch — every operation costs O(h)O(h), and a carelessly built tree degrades to height h=Θ(n)h = \Theta(n), motivating balance.

╌╌╌╌

A hash table gives expected lookups but throws away order: it cannot tell you the smallest key, the next key after a given one, or every key in a range. A binary search tree (BST) keeps those queries fast by storing keys in a shape that records their order. Each node holds a key and pointers to a left child, a right child, and a parent; the keys are arranged so that the tree itself is a kind of decision diagram for searching. The result is a dynamic ordered dictionary supporting search, insert, delete, minimum, maximum, predecessor, successor, and in-order traversal, every one of them in time proportional to the tree's height.

The binary search tree property

The arrangement is governed by one local invariant, checked at every node :

Smaller keys live to the left, larger keys to the right, everywhere, not just between a node and its immediate children.1 This recursive constraint is exactly what lets a search discard half the tree at each step.

A binary search tree with smaller keys left and larger keys right

Reading this tree: the root is ; everything in its left subtree () is and everything in its right subtree () is , and the same holds recursively at and .

Searching

To search for a key , start at the root and walk down. At each node, if equals the node's key we are done; if is smaller we go left, otherwise we go right. Each comparison drops us one level, so the search traces a single root-to-leaf path.

Algorithm:Tree-Search(x,k)\textsc{Tree-Search}(x, k) — find key kk in the subtree rooted at xx
  1. 1
    if x=nilx = \text{nil} or k=key(x)k = key(x) then
  2. 2
    return xx
  3. 3
    if k<key(x)k < key(x) then
  4. 4
    return call Tree-Search(left(x),k)\textsc{Tree-Search}(left(x), k)
    k is in the left subtree
  5. 5
    else
  6. 6
    return call Tree-Search(right(x),k)\textsc{Tree-Search}(right(x), k)
    k is in the right subtree

The procedure is correct by the BST property: when , the property guarantees cannot be in 's right subtree, so discarding it loses nothing. The search visits one node per level and runs in time, where is the height of the tree.2

Searching for . At we go left (); at we go right (); at we stop. The path (accent) visits one node per level, and each step discards an entire subtree (red), so the search examines just of the nodes.

and are the degenerate cases of search: follow left pointers until they run out to reach the smallest key, or right pointers for the largest.

Algorithm:Tree-Minimum(x)\textsc{Tree-Minimum}(x) and Tree-Maximum(x)\textsc{Tree-Maximum}(x)
  1. 1
    Tree-Minimum(x):
  2. 2
    while left(x)nilleft(x) \ne \text{nil} do
  3. 3
    xleft(x)x \gets left(x)
    min is the leftmost node
  4. 4
    return xx
  5. 5
    Tree-Maximum(x):
  6. 6
    while right(x)nilright(x) \ne \text{nil} do
  7. 7
    xright(x)x \gets right(x)
    max is the rightmost node
  8. 8
    return xx

Inserting

Insertion reuses the search path. To insert key , walk down as if searching for it; when the walk falls off the bottom of the tree (reaches a nil child), that empty spot is exactly where belongs, preserving the BST property. We attach a new leaf there, remembering the parent so we can hook it in.

Algorithm:Tree-Insert(T,z)\textsc{Tree-Insert}(T, z) — insert node zz (with key(z)key(z) set) into BST TT
  1. 1
    ynily \gets \text{nil}
    y trails x
  2. 2
    xroot(T)x \gets root(T)
  3. 3
    while xnilx \ne \text{nil} do
  4. 4
    yxy \gets x
  5. 5
    if key(z)<key(x)key(z) < key(x) then
  6. 6
    xleft(x)x \gets left(x)
  7. 7
    else
  8. 8
    xright(x)x \gets right(x)
  9. 9
    parent(z)yparent(z) \gets y
  10. 10
    if y=nily = \text{nil} then
  11. 11
    root(T)zroot(T) \gets z
    tree was empty
  12. 12
    else if key(z)<key(y)key(z) < key(y) then
  13. 13
    left(y)zleft(y) \gets z
  14. 14
    else
  15. 15
    right(y)zright(y) \gets z

Like search, insertion walks one root-to-leaf path and costs . New keys always enter as leaves, which is what keeps insertion simple, but also what sets up the trouble we meet below.

Finding a successor

The successor of a node is the node with the smallest key greater than , the next key in sorted order. There are two cases, and neither needs a comparison of keys, only structure:

  • If has a right subtree, the successor is the minimum of that subtree: the smallest key still larger than .
  • If has no right subtree, the successor is the lowest ancestor whose left child is also an ancestor of ; we climb up until we move up a left link.
Algorithm:Tree-Successor(x)\textsc{Tree-Successor}(x) — next node in sorted order
  1. 1
    if right(x)nilright(x) \ne \text{nil} then
  2. 2
    return call Tree-Minimum(right(x))\textsc{Tree-Minimum}(right(x))
    min of right subtree
  3. 3
    yparent(x)y \gets parent(x)
  4. 4
    while ynily \ne \text{nil} and x=right(y)x = right(y) do
  5. 5
    xyx \gets y
    climb while x is a right child
  6. 6
    yparent(y)y \gets parent(y)
  7. 7
    return yy

Both cases follow a single vertical path, down into the right subtree or up through ancestors, so also runs in . Predecessor is the mirror image (left subtree's maximum, or climb until a right link).

The two successor cases. Left: has a right subtree, so its successor is that subtree's minimum, (descend left from ). Right: has no right subtree, so climb until moving up a left link — — giving successor .

Deleting a node

Deletion is the one operation that needs care, because removing an internal node leaves a hole that must be filled without disturbing the BST property. There are three cases, in increasing difficulty:

  1. has no children: just detach it from its parent.
  2. has one child: splice that child into 's position.
  3. has two children: 's successor is the minimum of its right subtree, so has no left child. Move into 's position; if was not 's direct child, first replace by its own right child.

All three reduce to a single primitive, , which replaces the subtree rooted at with the subtree rooted at :

Algorithm:Transplant(T,u,v)\textsc{Transplant}(T, u, v) — put subtree vv where subtree uu was
  1. 1
    if parent(u)=nilparent(u) = \text{nil} then
  2. 2
    root(T)vroot(T) \gets v
  3. 3
    else if u=left(parent(u))u = left(parent(u)) then
  4. 4
    left(parent(u))vleft(parent(u)) \gets v
  5. 5
    else
  6. 6
    right(parent(u))vright(parent(u)) \gets v
  7. 7
    if vnilv \ne \text{nil} then
  8. 8
    parent(v)parent(u)parent(v) \gets parent(u)
Algorithm:Tree-Delete(T,z)\textsc{Tree-Delete}(T, z) — remove node zz from the BST
  1. 1
    if left(z)=nilleft(z) = \text{nil} then
  2. 2
    call Transplant(T,z,right(z))\textsc{Transplant}(T, z, right(z))
    lift the right child
  3. 3
    else if right(z)=nilright(z) = \text{nil} then
  4. 4
    call Transplant(T,z,left(z))\textsc{Transplant}(T, z, left(z))
    lift the left child
  5. 5
    else
  6. 6
    yy \gets call Tree-Minimum(right(z))\textsc{Tree-Minimum}(right(z))
    successor, no left child
  7. 7
    if parent(y)zparent(y) \ne z then
  8. 8
    call Transplant(T,y,right(y))\textsc{Transplant}(T, y, right(y))
    detach y, lift its right child
  9. 9
    right(y)right(z)right(y) \gets right(z)
  10. 10
    parent(right(y))yparent(right(y)) \gets y
  11. 11
    call Transplant(T,z,y)\textsc{Transplant}(T, z, y)
    y into z's slot
  12. 12
    left(y)left(z)left(y) \gets left(z)
  13. 13
    parent(left(y))yparent(left(y)) \gets y

The two-child case is the subtle one. Replacing by its successor keeps every key in 's left subtree below the new root and every key in the right subtree above it, so the ordering survives:

Deleting a node with two children: its successor (the minimum of the right subtree, here ) takes its place, and the successor's own right child () fills the slot it vacated.

Each branch does a constant amount of pointer surgery plus at most one call, so runs in like the rest.

The order is already there: inorder walk

Because the BST property sorts keys left-to-right at every node, visiting the tree in order — left subtree, then the node, then right subtree — emits the keys in increasing order.3

Algorithm:Inorder-Walk(x)\textsc{Inorder-Walk}(x) — print the subtree at xx in sorted order
  1. 1
    if xnilx \ne \text{nil} then
  2. 2
    call Inorder-Walk(left(x))\textsc{Inorder-Walk}(left(x))
    smaller keys first
  3. 3
    print key(x)key(x)
  4. 4
    call Inorder-Walk(right(x))\textsc{Inorder-Walk}(right(x))
    then larger keys

The walk visits each of nodes once, so it runs in time. This gives a clean way to read out a sorted sequence, and shows that a BST is, in effect, a dynamic sorted list you can also splice into and search.

The catch: height is everything

Every operation above costs . So the BST is fast exactly when is small. The best case is a balanced tree, where the two subtrees of each node have nearly equal size; then and every operation is .

The worst case is a disaster. Suppose we insert keys in sorted order: . Each new key is larger than everything present, so it walks all the way right and attaches as the rightmost leaf. The tree degenerates into a single descending path, a glorified linked list:

Sorted insertions degenerate a BST into a path of height

Now , and search, insert, and successor all degrade to , no better than scanning an unsorted array.4 The very flexibility that made insertion easy (new keys land as leaves wherever the path takes them) lets an unlucky or adversarial insertion order ruin the shape.

This is the central tension of binary search trees:

We cannot control the order in which keys arrive. So the fix is to make the tree rebalance itself as keys come and go, forcing no matter what. Randomized BSTs (and treaps) achieve height in expectation; balanced search trees — red-black trees, AVL trees, B-trees — guarantee height in the worst case by maintaining extra structural invariants and repairing them after each update. That repair machinery is the subject of the next lesson.

Takeaways

  • A binary search tree stores keys under the BST property (left subtree node right subtree, recursively), so searching follows one root-to-leaf path.
  • Search, insert, minimum, maximum, successor, predecessor all walk a single vertical path and cost ; new keys enter as leaves.
  • An inorder walk emits the keys in sorted order in time; the order is baked into the shape.
  • Performance hinges entirely on height: when balanced, but for a degenerate (e.g. sorted-insertion) tree.
  • Because we cannot control insertion order, we need trees that rebalance themselves to guarantee , the motivation for balanced search trees.

Footnotes

  1. CLRS, Ch. 12 — Binary Search Trees (§12.1): the BST property as a recursive ordering invariant.
  2. Skiena, §3.4 — Binary Search Trees: search along a single root-to-leaf path in time.
  3. Erickson, Ch. — Binary Search Trees: an inorder traversal emits the keys in sorted order.
  4. CLRS, Ch. 12 — Binary Search Trees (§12.4): operations cost , degrading to for an unbalanced tree.
Practice

╌╌ END ╌╌