Tries & Prefix Trees

A balanced search tree gives us $O (log n)$ lookups by comparing whole keys to each other. But when the keys are strings, a single comparison is not $O (1)$ : deciding whether "international" precedes "internet" already costs five character comparisons, so a BST of $n$ strings of length $L$ really spends $O (L log n)$ per operation. Worse, the BST throws away an obvious source of structure, namely that "intern", "internet", and "international" all share a prefix, and re-examines those shared characters on every descent.

A trie (the middle syllable of retrieval, usually pronounced try) exploits that structure. Instead of comparing keys against each other, it routes each key one character at a time down a tree whose edges are labeled by characters. A root-to-node path spells a prefix; following the characters of a key leads to the unique node that key reaches. Lookup costs $O (L)$ , where the work depends only on the key being searched for, never on $n$ , the number of stored keys.¹

The structure

The root represents the empty prefix. The bookkeeping is two-level: a node can exist purely as an interior waypoint (a prefix of some longer key) yet not be a key itself. In the word set ${to, tea, ted}$ the node at path t exists but is not a stored word, so its $i s E n d$ is false; the nodes at to, tea, and ted have $i s E n d = true$ . The flag is what lets a stored word be a prefix of another stored word: with both in and inn present, the node at in is simultaneously a terminal (its own flag is true) and an interior waypoint on the path to inn. Without the flag, the structure could not tell in is a word apart from in is merely a prefix of inn.

Each node needs a way to find a child by character. Two standard choices:

Array children. A fixed array of $∣Σ∣$ pointers per node (e.g. 26 for lowercase letters, or children[2] for a binary trie). Child lookup is a single $O (1)$ index, at the cost of $∣Σ∣$ slots per node whether used or not.
Hash-map children. A char → node map per node, so a node stores only the children it actually has. Smaller for sparse, large alphabets (Unicode), with a small constant-factor hashing overhead.

Operations: all $O (L)$

Insertion walks down from the root, creating a child node whenever the needed edge is missing, and sets $i s E n d$ on the final node. Search walks the same path but never creates; it fails the moment a needed edge is absent.

Algorithm:

\textsc{Insert}(T, w)

— add word

w = w_1 w_2 \dots w_L

1
$x \gets root(T)$
2
for $i \gets 1$ to $L$ do
3
$c \gets w_i$
4
if $child(x, c) = \text{nil}$ then
5
$child(x, c) \gets \textsc{NewNode}()$
6
$x \gets child(x, c)$
7
$isEnd(x) \gets \text{true}$

Algorithm:

\textsc{Search}(T, w)

— is

w

a stored key?

1
$x \gets root(T)$
2
for $i \gets 1$ to $L$ do
3
$c \gets w_i$
4
if $child(x, c) = \text{nil}$ then
5
return false
6
$x \gets child(x, c)$
7
return $isEnd(x)$

Each loop runs $L$ times and does $O (1)$ work per character (one indexed slot or one hash probe), so insert, search, and the prefix test all run in $O (L)$ . startsWith(p) is identical to $Search$ except it returns true as soon as the path for $p$ exists, ignoring $i s E n d$ , because any node on a valid path witnesses that $p$ is a prefix of some stored key.

Correctness rests on one property that every operation preserves:

$Insert$ preserves the invariant because it only ever creates the node its own path requires and flags only its final node; it cannot disturb any other key's path, which is why a trie needs no rebalancing — the shape is determined by the key set alone, not by insertion order.

Cost of the three string-dictionary structures per operation on

n

keys of length

L

. The trie alone drops the

log n

factor and answers prefix queries directly; the hash set is

O (L)

but unordered

A trie over {to, tea, ted, ten, in, inn}; shared prefixes stored once, lookup is

O (L)

The blue nodes are the six stored words; the white interior nodes (t, te, i) are prefixes shared among them and stored exactly once. Looking up ten visits three edges regardless of whether the trie holds six words or six million.

A build trace: counting created nodes

To see the sharing, build that trie from scratch, one insert at a time. Each insert walks its word and creates a node only where the path runs out:

Insert tea into the empty trie. No edges exist, so all three characters miss: create nodes for t, te, tea (3 new nodes); flag tea.
Insert ted. The walk reuses t and te (two existing edges), then d misses: 1 new node.
Insert to. Reuses t; o misses: 1 new node.
Insert in. Nothing under i exists: 2 new nodes.
Insert inn. Reuses i and in; the second n misses: 1 new node.
Insert ten. Reuses t and te; n misses: 1 new node.

The six words contain $3 + 3 + 2 + 2 + 3 + 3 = 16$ characters, but the finished trie has only 9 nodes besides the root, because the 7 characters that ride shared prefixes cost nothing. The more the key set overlaps, the wider that gap grows; inserting tent next would cost exactly one node.

Inserting \texttt{ten} into the trie of {to, tea, ted}: the walk reuses the existing

t

and

e

edges (accent) and creates exactly one node for the final \texttt{n}. Shared prefixes make later inserts cheap

Search versus prefix test: the word-inside-a-word case

The distinction between $Search$ and startsWith comes down to the $i s E n d$ flag, and the word set ${in, inn}$ exercises every case:

search("te") walks t, e successfully but reads $i s E n d = false$ at the te node: false. The path exists only as scaffolding for longer words.
startsWith("te") walks the same two edges and returns true immediately — the node's existence is the witness.
search("in") returns true even though in has a child: a terminal node may still be interior.
search("int") fails at the third character — the in node has no t edge — and this is the only way search fails: a missing edge, or a false flag at the end. There is no third failure mode.

Collected into one reference Trie, the interface is a direct transcription of the $O (L)$ walks above: insert creates missing links as it descends, __contains__ and starts_with share a single _node_at walk that differ only in whether they read $i s E n d$ , and keys_with_prefix runs the prefix-node walk then DFSes the subtree. Hash-map children keep each node to the edges it actually uses.

trie.pypython

from collections.abc import Iterator
from typing import Optional

class TrieNode:
  """
    A single trie node: child links keyed by character, plus a flag\n
    marking the end of a stored word.\n
  """

  def __init__(self) -> None:
    self.children: dict[str, TrieNode] = {}
    self.is_end_of_word: bool = False

class Trie:
  """
    A set of strings supporting prefix queries.\n
  """

  def __init__(self) -> None:
    self.root: TrieNode = TrieNode()

  def insert(self, word: str) -> None:
    """
      Add `word` to the trie, creating nodes as needed.\n
    """
    # descend the key, creating any missing child links.
    node = self.root
    for character in word:
      node = node.children.setdefault(character, TrieNode())

    node.is_end_of_word = True

  def _node_at(self, prefix: str) -> Optional[TrieNode]:
    """
      The node reached by spelling out `prefix`, or None if absent.\n
    """
    node = self.root
    for character in prefix:
      next_node = node.children.get(character)
      if next_node is None:
        return None
      node = next_node
    return node

  def __contains__(self, word: str) -> bool:
    node = self._node_at(word)
    return node is not None and node.is_end_of_word

  def starts_with(self, prefix: str) -> bool:
    """
      Whether any stored key begins with `prefix`.\n
    """
    return self._node_at(prefix) is not None

  def keys_with_prefix(self, prefix: str) -> Iterator[str]:
    """
      Yield every stored key beginning with `prefix`.\n
    """
    # nothing stored under this prefix.
    start: Optional[TrieNode] = self._node_at(prefix)
    if start is None:
      return

    # depth-first walk emitting each completed word as a marked node is hit.
    def walk(node: TrieNode, word: str) -> Iterator[str]:
      if node.is_end_of_word:
        yield word
      for character, child in node.children.items():
        yield from walk(child, word + character)

    yield from walk(start, prefix)

Deleting a key: prune on the way back

Deletion requires care because of the shared structure. Clearing $i s E n d$ at the word's node is always correct (searches for the deleted word now return false), but it can leave behind a chain of flagless, childless nodes that no surviving key uses. Removing ten from our running trie by flag alone leaves the n node dangling under te forever.

To address this, use a recursive delete that prunes on the way back up. Recurse to the end of the word, clear the flag, then, as the recursion unwinds, delete any node that is now both flagless and childless; stop pruning at the first node that still serves a purpose.

Algorithm:

\textsc{Delete}(x, w, i)

— remove

w

; returns true iff

x

should be pruned

1
if $i > L$ then
2
$isEnd(x) \gets \text{false}$
reached the word's node
3
else
4
$c \gets w_i$
5
if $child(x, c) = \text{nil}$ then return false
$w$ was never stored
6
if $\textsc{Delete}(child(x, c), w, i+1)$ then
7
$child(x, c) \gets \text{nil}$
unlink the pruned child
8
return $isEnd(x) = \text{false}$ and $x$ has no children and $x \neq root(T)$

The return value is the pruning decision: a node survives if its flag is set (it terminates another word) or it still has a child (it lies on another word's path). The word set ${in, inn}$ again covers the cases:

Delete inn. Clear the flag on the deep n node; it has no children, so it is pruned and unlinked. Unwinding reaches the in node, whose flag is still true — pruning stops. One node removed.
Delete in (from the original set). Clear the flag on the in node; it still has the n child leading to inn, so nothing is pruned. Zero nodes removed — the structure is untouched, only the flag flips.
Delete tea from ${tea}$ alone: all three nodes fail the survival test in turn and the whole chain unwinds away.

Each case costs one $O (L)$ descent and one $O (L)$ unwind, so delete is $O (L)$ like everything else. An equivalent bookkeeping scheme stores a reference count in each node — the number of stored words whose path passes through it — incremented on insert, decremented on delete; a node is pruned when its count hits zero. Same effect, and it also answers how many words start with $p$ ? in $O (∣ p ∣)$ .

Prune-on-the-way-back in {in, inn}. Left: deleting \texttt{inn} clears its flag, finds the node childless, and prunes it (dashed); the unwind stops at \texttt{in}, still flagged. Right: deleting \texttt{in} only clears the flag — the node stays, because \texttt{inn} still needs the path through it

Space and trade-offs

With array children the worst case is $O (n \cdot L \cdot ∣Σ∣)$ pointers: $n$ keys, up to $L$ nodes each, $∣Σ∣$ slots per node. That bound is pessimistic, and tries are far better than it suggests precisely when prefixes are shared: every common prefix collapses to a single path, so a dictionary of English words (densely overlapping) stores far fewer than $n \cdot L$ nodes. Hash-map children replace the $∣Σ∣$ factor with the actual child count, trading a constant for the array's $O (1)$ indexing.

For example, with 64-bit pointers and $∣Σ∣ = 26$ , an array node carries $26 \times 8 = 208$ bytes of child slots plus the flag — call it 216 bytes — whether it has 26 children or one. Deep in a trie most nodes have exactly one child (long unshared word tails), so almost all of those slots hold nil. A 100{,}000-word dictionary that compresses to roughly 250{,}000 nodes then occupies about $250, 000 \times 216 \approx 54$ MB of node storage, some fifty times the ~1 MB of raw text it encodes. Hash-map children shrink a one-child node to one map entry, but each entry drags its own overhead (hashing, buckets, per-entry headers — tens of bytes), and child lookup gains a constant factor over a direct index. The binary trie sits at the other extreme and is why the XOR trick below is cheap: $∣Σ∣ = 2$ means just two pointers, 16 bytes per node.

The regimes, then: array children when the alphabet is small and speed matters (26 lowercase letters, 2 bits); hash-map children when the alphabet is large or sparse (Unicode); a radix tree (end of this lesson) when memory dominates and the one-child chains must go.

Applications

Autocomplete and prefix search. To offer completions for what a user has typed, walk to the node for the typed prefix in $O (L)$ , then DFS the subtree beneath it to enumerate every stored key with that prefix, since the trie has already grouped them. The enumeration costs $O (L + s)$ where $s$ is the size of the emitted subtree — proportional to the answer, not to the dictionary — and if the DFS visits children in alphabet order, the completions come out already sorted.

autocomplete.pypython

from typing import NamedTuple, Optional

class Completion(NamedTuple):
  """
    One suggested term and the total weight that ranks it.\n
  """
  term: str
  weight: float

class AutocompleteNode:
  """
    One trie node: child links keyed by character and the weight of the\n
    term ending here (0 when no stored term ends at this node).\n
  """

  def __init__(self) -> None:
    self.children: dict[str, AutocompleteNode] = {}
    self.weight: float = 0.0

class Autocomplete:
  """
    A weighted term set answering prefix-completion queries.\n
  """

  def __init__(self) -> None:
    self.root: AutocompleteNode = AutocompleteNode()

  def insert(self, term: str, weight: float = 1.0) -> None:
    """
      Add `term` with `weight`, accumulating onto any prior weight.\n
    """
    node = self.root
    for character in term:
      node = node.children.setdefault(character, AutocompleteNode())
    node.weight += weight

  def _node_at(self, prefix: str) -> Optional[AutocompleteNode]:
    """
      The node reached by spelling out `prefix`, or None if absent.\n
    """
    node = self.root
    for character in prefix:
      next_node = node.children.get(character)
      if next_node is None:
        return None
      node = next_node
    return node

  def complete(self, prefix: str, limit: Optional[int] = None) -> list[Completion]:
    """
      Up to `limit` terms beginning with `prefix`, ranked by descending\n
      weight then lexicographically. `limit=None` returns them all.\n
    """
    start: Optional[AutocompleteNode] = self._node_at(prefix)
    if start is None:
      return []

    matches: list[Completion] = []

    def gather(node: AutocompleteNode, spelled: str) -> None:
      if node.weight > 0:
        matches.append(Completion(spelled, node.weight))
      for character, child in node.children.items():
        gather(child, spelled + character)

    gather(start, prefix)

    # heaviest first; lexicographic order settles equal weights.
    matches.sort(key=lambda completion: (-completion.weight, completion.term))
    return matches if limit is None else matches[:limit]

Autocomplete for prefix \texttt{te}: walk to the prefix node in

O (L)

(accent path

t \to e

), then DFS the subtree to emit every completion — \texttt{tea}, \texttt{ted}, \texttt{ten} — already grouped under that node

Wildcard dictionary (the . problem). Design Add and Search Words asks for a dictionary where a query may contain . matching any single character. Plain search no longer follows one path: at a . we must branch into all children and recurse. Concrete characters keep the search $O (L)$ ; each . multiplies the branching, but the trie still prunes any path that cannot match.

Algorithm:

\textsc{WildSearch}(x, w, i)

— match

w

from node

x

w_i

may be

\texttt{.}

1
if $i > L$ then
2
return $isEnd(x)$
3
if $w_i = \texttt{"."}$ then
4
for each child $c$ of $x$ do
5
if $\textsc{WildSearch}(c, w, i+1)$ then return true
6
return false
7
else
8
if $child(x, w_i) = \text{nil}$ then return false
9
return $\textsc{WildSearch}(child(x, w_i), w, i+1)$

Wildcard search for \texttt{t.n}: the concrete \texttt{t} follows one edge, the \texttt{.} branches into every child (red, both \texttt{o} and \texttt{e}), and only the \texttt{e} branch survives to spell \texttt{ten}. A \texttt{.} fans the search; concrete characters keep it on one path

In the worst case a query of $d$ dots over alphabet $Σ$ can visit $∣Σ ∣^{d}$ paths, so the bound degrades to $O (∣Σ ∣^{d} \cdot L)$ — but every branch dies the instant its next concrete character has no edge, and in a real dictionary almost all of them die immediately. The figure's query t.n fans to two children at the dot and kills the o branch one character later.

Word search on a board. Word Search II hunts for many dictionary words in a grid simultaneously. Building a trie of all target words lets one DFS over the board carry a trie pointer alongside the grid position: the instant the current board path spells a string that is not a prefix of any target, the missing trie edge prunes the entire branch. One traversal finds all words, and the shared prefixes mean overlapping targets share work.

wildcard_dictionary.pypython

class WildcardNode:
  """
    One trie node: child links keyed by character plus a flag marking the\n
    end of a stored word.\n
  """

  def __init__(self) -> None:
    self.children: dict[str, WildcardNode] = {}
    self.is_end_of_word: bool = False

class WildcardDictionary:
  """
    A set of words supporting search with '.' matching any single character.\n
  """

  def __init__(self) -> None:
    self.root: WildcardNode = WildcardNode()

  def add_word(self, word: str) -> None:
    """
      Insert `word`, creating nodes along its path as needed.\n
    """
    node = self.root
    for character in word:
      node = node.children.setdefault(character, WildcardNode())
    node.is_end_of_word = True

  def search(self, pattern: str) -> bool:
    """
      Whether some stored word matches `pattern`, where each '.' in the\n
      pattern matches any single character.\n
    """

    def matches_from(node: WildcardNode, position: int) -> bool:
      # the whole pattern is consumed only at a stored word's end.
      if position == len(pattern):
        return node.is_end_of_word

      character: str = pattern[position]

      # a wildcard tries every child; any surviving branch is a match.
      if character == ".":
        return any(
          matches_from(child, position + 1)
          for child in node.children.values()
        )

      # a concrete character follows the single matching edge, if present.
      child = node.children.get(character)
      if child is None:
        return False
      return matches_from(child, position + 1)

    return matches_from(self.root, 0)

board_word_search.pypython

from collections.abc import Iterable
from typing import Optional

class BoardTrieNode:
  """
    One trie node: child links keyed by character plus the complete word\n
    stored at the node that ends it (None for interior nodes).\n
  """

  def __init__(self) -> None:
    self.children: dict[str, BoardTrieNode] = {}
    self.word: Optional[str] = None

def _build_trie(words: Iterable[str]) -> BoardTrieNode:
  """
    A trie holding every target word, each tagged at its terminal node.\n
  """
  root: BoardTrieNode = BoardTrieNode()
  for word in words:
    node = root
    for character in word:
      node = node.children.setdefault(character, BoardTrieNode())
    node.word = word
  return root

def find_words(
  board: list[list[str]],
  words: Iterable[str],
) -> list[str]:
  """
    Every target word that can be spelled by a path of orthogonally\n
    adjacent cells in `board`, using each cell at most once per word.\n
    Each found word appears once, in lexicographic order.\n
  """
  # empty board can spell nothing.
  root: BoardTrieNode = _build_trie(words)
  if not board or not board[0]:
    return []

  height: int = len(board)
  width: int = len(board[0])
  found: set[str] = set()

  def explore(row: int, column: int, node: BoardTrieNode) -> None:
    # this cell's letter must extend the current trie path.
    character: str = board[row][column]
    child = node.children.get(character)
    if child is None:
      return

    # a terminal node completes a target word.
    if child.word is not None:
      found.add(child.word)

    # mark the cell used so this path cannot revisit it.
    board[row][column] = "#"

    # recurse into each in-bounds, still-unused neighbour.
    for next_row, next_column in (
      (row - 1, column),
      (row + 1, column),
      (row, column - 1),
      (row, column + 1),
    ):
      if 0 <= next_row < height and 0 <= next_column < width:
        if board[next_row][next_column] != "#":
          explore(next_row, next_column, child)

    # restore the cell for sibling paths.
    board[row][column] = character

  # launch a DFS from every cell, then report words once, sorted.
  for row in range(height):
    for column in range(width):
      explore(row, column, root)
  return sorted(found)

The binary trie: maximum XOR pair

A non-string application treats a fixed-width integer as a string of bits over $Σ = {0, 1}$ . Maximum XOR of Two Numbers asks for $max_{i, j} (a_{i} \oplus a_{j})$ . Brute force is $Θ (n^{2})$ ; a binary trie solves it in $O (n \cdot b)$ for $b$ -bit numbers.

Insert every number bit-by-bit from the high bit down, so each root-to-leaf path of length $b$ is one number. To maximize the XOR of a query $a$ against the stored set, walk down from the root and at each bit greedily steer toward the opposite bit of $a$ : a differing bit contributes a $1$ at that (high) position. If the opposite child exists, take it; otherwise follow the only child available. The path traced spells the stored number that maximizes $a \oplus (\cdot)$ .

The greedy choice is safe because of the geometric-series gap: winning bit position $k$ is worth $2^{k}$ , while every lower position combined is worth at most

2^{k - 1} + 2^{k - 2} + \dots + 2^{0} = 2^{k} - 1 < 2^{k} .

So any candidate that differs from $a$ at bit $k$ beats every candidate that agrees there, no matter how the lower bits fall — the usual exchange argument collapses to one inequality. The greedy walk never needs to backtrack, and one subtlety makes it total: the trie stores complete $b$ -bit paths (leading zeros included), so whenever the preferred child is missing, the other child must exist, and the walk always reaches depth $b$ . Building the trie costs $O (n \cdot b)$ ; querying each of the $n$ numbers costs $O (b)$ ; total $O (n \cdot b)$ versus $Θ (n^{2} b)$ for brute force. For 32-bit values and $n = 1 0^{5}$ that is $3.2 \times 1 0^{6}$ steps instead of on the order of $1 0^{10}$ .

Binary trie on {010, 011, 110}; greedy walk for a query maximizes XOR by taking opposite bits

The trie holds ${010, 011, 110}$ . For the query $a = 100$ the greedy walk wants the opposite bit at each level: $0, 1, 1$ . The high bit of $a$ is $1$ , so it takes the $0$ -child; the remaining two bits of $a$ are $0$ , so at each step the wanted opposite bit $1$ is available and the walk follows it. It lands on the stored number $011$ , giving $100 \oplus 011 = 111$ , the maximum.

maximum_xor_pair.pypython

from collections.abc import Sequence
from typing import Optional

class BitTrieNode:
  """
    One node of a binary trie: two child links, one per bit value.\n
    `children[0]` follows a 0 bit and `children[1]` follows a 1 bit.\n
  """

  def __init__(self) -> None:
    self.children: list[Optional[BitTrieNode]] = [None, None]

class BinaryTrie:
  """
    A set of fixed-width non-negative integers stored bit-by-bit.\n
    `bit_width` is the number of bits read per number, from high to low.\n
  """

  def __init__(self, bit_width: int = 31) -> None:
    self.bit_width: int = bit_width
    self.root: BitTrieNode = BitTrieNode()

  def insert(self, number: int) -> None:
    """
      Add `number` to the trie, one bit at a time from the high bit down.\n
    """
    node = self.root
    for position in range(self.bit_width - 1, -1, -1):
      # descend the high-to-low bit, creating the branch on first use.
      bit: int = (number >> position) & 1
      child = node.children[bit]
      if child is None:
        child = BitTrieNode()
        node.children[bit] = child
      node = child

  def max_xor_with(self, query: int) -> int:
    """
      The largest `query ^ stored` over every number already inserted.\n
      The trie must be non-empty.\n
    """
    node = self.root
    best: int = 0
    for position in range(self.bit_width - 1, -1, -1):
      bit: int = (query >> position) & 1
      desired: int = 1 - bit

      # the opposite-bit child sets a 1 here; otherwise this position is 0.
      opposite = node.children[desired]
      if opposite is not None:
        best |= 1 << position
        node = opposite
      else:
        same_child = node.children[bit]
        assert same_child is not None
        node = same_child

    return best

def maximum_xor_pair(numbers: Sequence[int], bit_width: int = 31) -> int:
  """
    The maximum XOR over all pairs in `numbers` (including a value with\n
    itself when only one is present, which yields 0). Returns 0 for fewer\n
    than two numbers.\n
  """
  if len(numbers) < 2:
    return 0

  trie: BinaryTrie = BinaryTrie(bit_width)
  trie.insert(numbers[0])
  best: int = 0
  for number in numbers[1:]:

    # every earlier number is already stored, so this covers each pair once.
    best = max(best, trie.max_xor_with(number))
    trie.insert(number)
  return best

Multi-pattern and compressed variants

A trie of patterns augmented with failure links, pointers that, on a mismatch, jump to the longest proper suffix of the current match that is also a prefix in the trie, is the Aho–Corasick automaton: it scans a text once and reports every occurrence of every pattern in linear time, the multi-string generalization of KMP (which is single-pattern failure-link matching).²

For storage, a compressed trie (a radix tree or Patricia trie) attacks the one-child chains directly: contract every maximal chain of single-child, unflagged nodes into one edge labeled by the whole substring.³ Every interior node then has at least two children (or is a terminal), which caps the node count at $O (n)$ for $n$ keys — independent of key length — because a tree with $n$ leaves and no unary interior nodes has at most $n - 1$ interior nodes. The stored strings shrink to one pointer-plus-length pair per edge.

In exchange, insertion is more intricate: a new key may match an edge label only partway, forcing an edge split. Take a radix tree holding ${tea, ten}$ : one edge labeled te leaves the root, then edges a and n branch to the two terminals. Inserting to walks the root edge and mismatches at its second character ( $o \neq = e$ ), so the edge splits at the common prefix t: a new interior node takes over, with the remainder e of the old label on one side (keeping its a/n subtree intact) and a fresh o edge on the other. One split per insert suffices, and the operation stays $O (L)$ .

Radix-tree edge split. Left: {tea, ten} share the contracted edge \texttt{te}. Right: inserting \texttt{to} matches only the \texttt{t}, so the edge splits at a new interior node (dashed); the old \texttt{e} remainder keeps its subtree, and \texttt{o} branches off fresh

Suffix trees and suffix arrays push the compression idea further, indexing all suffixes of a text for fast substring search — the subject of the next lesson.

Where tries go in the real world

Tries are the standard structure for IP routing. A router must match a destination address against a table of prefixes and forward on the longest matching prefix — exactly a prefix search in a binary trie over the address bits. Naive bit-at-a-time tries are too slow for line-rate forwarding, so production routers use compressed and multi-bit variants: the Patricia trie (Morrison, PATRICIA, JACM 1968) collapses one-child chains just as this lesson's radix tree does, and the LC-trie / multibit-trie families (Nilsson & Karlsson, IP-Address Lookup Using LC-Tries, IEEE JSAC 1999) consume several bits per node to bound the depth. The longest-prefix-match problem is the reason tries, rather than hash tables, sit in the data plane of the internet.

The compression idea also scales to enormous static dictionaries through the DAWG (directed acyclic word graph): merge not only shared prefixes, as a trie does, but also shared suffixes, turning the trie into a minimal deterministic automaton for the word set. This is the classic representation for spell-checkers and Scrabble engines, storing hundreds of thousands of words in a few hundred kilobytes. Pushed to indexing every substring of a text rather than a fixed word list, the same automaton idea becomes the suffix automaton, a cousin of the suffix arrays and Aho–Corasick automaton of the next lesson.

For storing a large set of strings where you only need membership tests and can tolerate a small false-positive rate, tries compete with succinct alternatives. A Bloom filter answers have I seen this key? in constant space per element with no per-key pointers; a trie answers the same question exactly and additionally supports prefix and ordered queries, at the cost of the per-node pointer storage a Bloom filter avoids.

Takeaways

A trie is a rooted tree whose edges are labeled by characters; a root-to-node path spells a prefix, and an $i s E n d$ flag marks nodes that complete a stored key — the flag is what distinguishes in stored as a word from in existing only as a prefix of inn. Children are an array of size $∣Σ∣$ or a per-node map.
Insert, search, delete, and startsWith all run in $O (L)$ , the key length, and are independent of $n$ , beating a BST's $O (L log n)$ . Delete must prune on the way back: unlink nodes left flagless and childless, stopping at the first node another word still needs.
Space is up to $O (n \cdot L \cdot ∣Σ∣)$ with array children ( $26 \times 8 = 208$ bytes of slots per node, used or not), but shared prefixes are stored once, so prefix-heavy sets compress well; tries beat hash sets by giving ordered traversal, prefix queries, and no collisions, while hash sets win pure membership tests on cache behavior.
Tries power autocomplete ( $O (L + s)$ , proportional to the output), wildcard . matching, and board word-search pruning; over ${0, 1}$ a binary trie solves maximum-XOR pair in $O (n \cdot b)$ by greedily walking toward the opposite bit — safe because $2^{k} > 2^{k} - 1$ , the worth of all lower bits combined.
Aho–Corasick = trie + failure links = multi-pattern KMP; Patricia / radix trees contract single-child chains into substring-labeled edges, splitting an edge when an insert matches its label only partway, which caps the node count at $O (n)$ ; suffix trees / arrays index all suffixes of a text.

Skiena, § — String Data Structures: tries route keys character-by-character, giving $O (L)$ search independent of the number of stored strings. ↩
Erickson, Ch. — Data Structures: tries as a string dictionary; failure links extend a trie into the Aho–Corasick multi-pattern matcher. ↩
CLRS, Problem 12-2 — Radix trees: the trie over bit strings, sorted output by preorder traversal, and the compressed form. ↩

The structure

Operations: all O(L)

A build trace: counting created nodes

Search versus prefix test: the word-inside-a-word case

Deleting a key: prune on the way back

Space and trade-offs

Applications

The binary trie: maximum XOR pair

Multi-pattern and compressed variants

Where tries go in the real world

Takeaways

Footnotes

Operations: all $O (L)$