Binary Search Trees
A binary search tree keeps keys ordered so that every operation follows a single root-to-leaf path. We state the BST property, give search and insert, find minimum, maximum, and successor, see that an inorder walk emits the keys in sorted order, and confront the catch — every operation costs , and a carelessly built tree degrades to height , motivating balance.
╌╌╌╌
A hash table gives expected lookups but throws away order: it cannot tell you the smallest key, the next key after a given one, or every key in a range. A binary search tree (BST) keeps those queries fast by storing keys in a shape that records their order. Each node holds a key and pointers to a left child, a right child, and a parent; the keys are arranged so that the tree itself is a kind of decision diagram for searching. The result is a dynamic ordered dictionary supporting search, insert, delete, minimum, maximum, predecessor, successor, and in-order traversal, every one of them in time proportional to the tree's height.
The binary search tree property
The arrangement is governed by one local invariant, checked at every node :
Smaller keys live to the left, larger keys to the right, everywhere, not just between a node and its immediate children.1 This recursive constraint is exactly what lets a search discard half the tree at each step.
Reading this tree: the root is ; everything in its left subtree () is and everything in its right subtree () is , and the same holds recursively at and .
Searching
To search for a key , start at the root and walk down. At each node, if equals the node's key we are done; if is smaller we go left, otherwise we go right. Each comparison drops us one level, so the search traces a single root-to-leaf path.
- 1if or then
- 2return
- 3if then
- 4return callk is in the left subtree
- 5else
- 6return callk is in the right subtree
The procedure is correct by the BST property: when , the property guarantees cannot be in 's right subtree, so discarding it loses nothing. The search visits one node per level and runs in time, where is the height of the tree.2
and are the degenerate
cases of search: follow left pointers until they run out to reach the smallest
key, or right pointers for the largest.
- 1Tree-Minimum(x):
- 2while do
- 3min is the leftmost node
- 4return
- 5Tree-Maximum(x):
- 6while do
- 7max is the rightmost node
- 8return
Inserting
Insertion reuses the search path. To insert key , walk down as if searching
for it; when the walk falls off the bottom of the tree (reaches a nil child),
that empty spot is exactly where belongs, preserving the BST property. We
attach a new leaf there, remembering the parent so we can hook it in.
- 1y trails x
- 2
- 3while do
- 4
- 5if then
- 6
- 7else
- 8
- 9
- 10if then
- 11tree was empty
- 12else if then
- 13
- 14else
- 15
Like search, insertion walks one root-to-leaf path and costs . New keys always enter as leaves, which is what keeps insertion simple, but also what sets up the trouble we meet below.
Finding a successor
The successor of a node is the node with the smallest key greater than , the next key in sorted order. There are two cases, and neither needs a comparison of keys, only structure:
- If has a right subtree, the successor is the minimum of that subtree: the smallest key still larger than .
- If has no right subtree, the successor is the lowest ancestor whose left child is also an ancestor of ; we climb up until we move up a left link.
- 1if then
- 2return callmin of right subtree
- 3
- 4while and do
- 5climb while x is a right child
- 6
- 7return
Both cases follow a single vertical path, down into the right subtree or up through ancestors, so also runs in . Predecessor is the mirror image (left subtree's maximum, or climb until a right link).
Deleting a node
Deletion is the one operation that needs care, because removing an internal node leaves a hole that must be filled without disturbing the BST property. There are three cases, in increasing difficulty:
- has no children: just detach it from its parent.
- has one child: splice that child into 's position.
- has two children: 's successor is the minimum of its right subtree, so has no left child. Move into 's position; if was not 's direct child, first replace by its own right child.
All three reduce to a single primitive, , which replaces the subtree rooted at with the subtree rooted at :
- 1if then
- 2
- 3else if then
- 4
- 5else
- 6
- 7if then
- 8
- 1if then
- 2calllift the right child
- 3else if then
- 4calllift the left child
- 5else
- 6callsuccessor, no left child
- 7if then
- 8calldetach y, lift its right child
- 9
- 10
- 11cally into z's slot
- 12
- 13
The two-child case is the subtle one. Replacing by its successor keeps every key in 's left subtree below the new root and every key in the right subtree above it, so the ordering survives:
Each branch does a constant amount of pointer surgery plus at most one call, so runs in like the rest.
The order is already there: inorder walk
Because the BST property sorts keys left-to-right at every node, visiting the tree in order — left subtree, then the node, then right subtree — emits the keys in increasing order.3
- 1if then
- 2callsmaller keys first
- 3print
- 4callthen larger keys
The walk visits each of nodes once, so it runs in time. This gives a clean way to read out a sorted sequence, and shows that a BST is, in effect, a dynamic sorted list you can also splice into and search.
The catch: height is everything
Every operation above costs . So the BST is fast exactly when is small. The best case is a balanced tree, where the two subtrees of each node have nearly equal size; then and every operation is .
The worst case is a disaster. Suppose we insert keys in sorted order: . Each new key is larger than everything present, so it walks all the way right and attaches as the rightmost leaf. The tree degenerates into a single descending path, a glorified linked list:
Now , and search, insert, and successor all degrade to , no better than scanning an unsorted array.4 The very flexibility that made insertion easy (new keys land as leaves wherever the path takes them) lets an unlucky or adversarial insertion order ruin the shape.
This is the central tension of binary search trees:
We cannot control the order in which keys arrive. So the fix is to make the tree rebalance itself as keys come and go, forcing no matter what. Randomized BSTs (and treaps) achieve height in expectation; balanced search trees — red-black trees, AVL trees, B-trees — guarantee height in the worst case by maintaining extra structural invariants and repairing them after each update. That repair machinery is the subject of the next lesson.
Takeaways
- A binary search tree stores keys under the BST property (left subtree node right subtree, recursively), so searching follows one root-to-leaf path.
- Search, insert, minimum, maximum, successor, predecessor all walk a single vertical path and cost ; new keys enter as leaves.
- An inorder walk emits the keys in sorted order in time; the order is baked into the shape.
- Performance hinges entirely on height: when balanced, but for a degenerate (e.g. sorted-insertion) tree.
- Because we cannot control insertion order, we need trees that rebalance themselves to guarantee , the motivation for balanced search trees.
Footnotes
- CLRS, Ch. 12 — Binary Search Trees (§12.1): the BST property as a recursive ordering invariant. ↩
- Skiena, §3.4 — Binary Search Trees: search along a single root-to-leaf path in time. ↩
- Erickson, Ch. — Binary Search Trees: an inorder traversal emits the keys in sorted order. ↩
- CLRS, Ch. 12 — Binary Search Trees (§12.4): operations cost , degrading to for an unbalanced tree. ↩
╌╌ END ╌╌