Tries & Prefix Trees
A trie stores a set of strings in a tree keyed by characters, so that insert, search, and prefix-test all run in time — the length of the key, independent of how many keys are stored. Shared prefixes are stored once, which makes tries the natural structure for autocomplete, wildcard dictionaries, board word-search, and — over the alphabet — the maximum-XOR-pair problem.
╌╌╌╌
A balanced search tree gives us lookups by comparing whole keys to
each other. But when the keys are strings, a single comparison is not :
deciding whether "international" precedes "internet" already costs five
character comparisons, so a BST of strings of length really spends
per operation. Worse, the BST throws away an obvious source of
structure, namely that "intern", "internet", and "international" all share a
prefix, and re-examines those shared characters on every descent.
A trie (from re_trie_val; usually pronounced try
) exploits exactly that
structure. Instead of comparing keys against each other, it routes each key
one character at a time down a tree whose edges are labeled by characters. A
root-to-node path spells a prefix; following the characters of a key leads to
the unique node that key reaches. Lookup costs , where the work depends only on
the key being searched for, never on , the number of stored keys.1
The structure
The root represents the empty prefix. Note the two-level bookkeeping: a node can
exist purely as an interior waypoint (a prefix of some longer key) yet not be a
key itself. In the word set the node
at path t exists but is not a stored word, so its is false; the nodes at
to, tea, and ted have .
Each node needs a way to find a child by character. Two standard choices:
- Array children. A fixed array of pointers per node (e.g. 26 for
lowercase letters, or
children[2]for a binary trie). Child lookup is a single index, at the cost of slots per node whether used or not. - Hash-map children. A
char → nodemap per node, so a node stores only the children it actually has. Smaller for sparse, large alphabets (Unicode), with a small constant-factor hashing overhead.
Operations: all
Insertion walks down from the root, creating a child node whenever the needed edge is missing, and sets on the final node. Search walks the same path but never creates; it fails the moment a needed edge is absent.
- 1
- 2for to do
- 3
- 4if then
- 5
- 6
- 7
- 1
- 2for to do
- 3
- 4if then
- 5return false
- 6
- 7return
Each loop runs times and does work per character (one indexed slot or
one hash probe), so insert, search, and the prefix test all run in .
startsWith(p) is identical to except it returns true as soon as
the path for exists, ignoring , because any node on a valid path
witnesses that is a prefix of some stored key.
The blue nodes are the six stored words; the white interior nodes (t, te,
i) are prefixes shared among them and stored exactly once. Looking up ten
visits three edges regardless of whether the trie holds six words or six million.
Space and trade-offs
With array children the worst case is pointers: keys, up to nodes each, slots per node. That bound is pessimistic, and tries are far better than it suggests precisely when prefixes are shared: every common prefix collapses to a single path, so a dictionary of English words (densely overlapping) stores far fewer than nodes. Hash-map children replace the factor with the actual child count, trading a constant for the array's indexing.
Applications
Autocomplete and prefix search. To offer completions for what a user has typed, walk to the node for the typed prefix in , then DFS the subtree beneath it to enumerate every stored key with that prefix, since the trie has already grouped them.
Wildcard dictionary (the .
problem). Design Add and Search Words asks for a
dictionary where a query may contain . matching any single character. Plain
search no longer follows one path: at a . we must branch into all children
and recurse. Concrete characters keep the search ; each . multiplies the
branching, but the trie still prunes any path that cannot match.
- 1if then
- 2return
- 3if then
- 4for each child of do
- 5if then return true
- 6return false
- 7else
- 8if then return false
- 9return
Word search on a board. Word Search II hunts for many dictionary words in a grid simultaneously. Building a trie of all target words lets one DFS over the board carry a trie pointer alongside the grid position: the instant the current board path spells a string that is not a prefix of any target, the missing trie edge prunes the entire branch. One traversal finds all words, and the shared prefixes mean overlapping targets share work.
The binary trie: maximum XOR pair
A beautiful non-string use treats a fixed-width integer as a string of bits over . Maximum XOR of Two Numbers asks for . Brute force is ; a binary trie solves it in for -bit numbers.
Insert every number bit-by-bit from the high bit down, so each root-to-leaf path of length is one number. To maximize the XOR of a query against the stored set, walk down from the root and at each bit greedily steer toward the opposite bit of : a differing bit contributes a at that (high) position, worth more than every lower bit combined. If the opposite child exists, take it; otherwise follow the only child available. The path traced spells the stored number that maximizes .
The trie holds . For the query the greedy walk wants the opposite bit at each level: . The high bit of is , so it takes the -child; the remaining two bits of are , so at each step the wanted opposite bit is available and the walk follows it. It lands on the stored number , giving , the maximum.
Multi-pattern and compressed variants
A trie of patterns augmented with failure links, pointers that, on a mismatch, jump to the longest proper suffix of the current match that is also a prefix in the trie, is the Aho–Corasick automaton: it scans a text once and reports every occurrence of every pattern in linear time, the multi-string generalization of KMP (which is single-pattern failure-link matching).2 For storage, a compressed trie (a radix tree or Patricia trie) contracts each chain of single-child nodes into one edge labeled by a whole substring, shrinking the node count dramatically; suffix trees and suffix arrays push the idea further, indexing all suffixes of a text for fast substring search, the subject of later lessons.3
Takeaways
- A trie is a rooted tree whose edges are labeled by characters; a root-to-node path spells a prefix, and an flag marks nodes that complete a stored key. Children are an array of size or a per-node map.
- Insert, search, and
startsWithall run in , the key length, and are independent of , beating a BST's . - Space is up to with array children, but shared prefixes are stored once, so prefix-heavy sets compress well; tries beat hash sets by giving ordered traversal, prefix queries, and no collisions.
- Tries power autocomplete, wildcard
.matching, and board word-search pruning; over a binary trie solves maximum-XOR pair by greedily walking toward the opposite bit from the high bit down. - Aho–Corasick = trie + failure links = multi-pattern KMP; Patricia / radix trees compress single-child chains, and suffix trees / arrays index all suffixes of a text.
Footnotes
- Skiena, § — String Data Structures: tries route keys character-by-character, giving search independent of the number of stored strings. ↩
- Erickson, Ch. — Data Structures: tries as a string dictionary; failure links extend a trie into the Aho–Corasick multi-pattern matcher. ↩
- Sedgewick, §5.2 — Tries: -way and ternary tries, prefix operations, and compressed (Patricia) variants for space. ↩
╌╌ END ╌╌