Graphs/Lowest Common Ancestor & Binary Lifting

Lesson 6.71,451 words

Lowest Common Ancestor & Binary Lifting

Given a rooted tree, the lowest common ancestor of uu and vv is the deepest node that is an ancestor of both. A naive walk answers one query in O(h)O(h); binary lifting precomputes the 2k2^k-th ancestor of every node in O(nlogn)O(n\log n), then answers kk-th-ancestor and LCA queries in O(logn)O(\log n) each.

╌╌╌╌

The previous lessons gave us a rooted tree and a single root-to-node path for each vertex. Many problems instead ask about two vertices at once: how far apart are and ? what is the highest fork their paths share? which company contains both of two nested regions? Each reduces to one question: the lowest common ancestor.

The LCA is well defined and unique: the sets of ancestors of and of are each a chain from the root, so their intersection is a chain, and a finite chain has a unique deepest element.

The naive walk

If every node stores a parent pointer and a depth, one query is easy. Lift the deeper of until both sit at the same depth, then advance both pointers up in lockstep; the first node they agree on is the LCA.

Algorithm:Naive-LCA(u,v)\textsc{Naive-LCA}(u, v) — climb to equal depth, then together
  1. 1
    while depth[u]>depth[v]depth[u] > depth[v] do
  2. 2
    uparent[u]u \gets parent[u]
  3. 3
    while depth[v]>depth[u]depth[v] > depth[u] do
  4. 4
    vparent[v]v \gets parent[v]
  5. 5
    while uvu \ne v do
  6. 6
    uparent[u]u \gets parent[u]
  7. 7
    vparent[v]v \gets parent[v]
  8. 8
    return uu

This needs no preprocessing and is correct, but each step moves up one edge, so a query costs where is the tree's height. On a balanced tree , but on a degenerate path , and queries cost . We want a query cost that does not depend on shape. (For the asymptotic notation, see asymptotic analysis.)

Binary lifting

The fix is to make each jump cover an exponentially larger distance. Instead of go up one, precompute, for every node and every , a pointer that goes up edges at once.

The whole table is built from a single doubling identity: climbing edges is climbing edges twice.

So column of the table is computed entirely from column , one pass per power of two. The number of columns is , since no node has an ancestor more than edges up.

Algorithm:Build-Up(T)\textsc{Build-Up}(T) — preprocess 2k2^k-th ancestors via doubling
  1. 1
    run a DFS/BFS from the root to fill parent[]parent[\cdot] and depth[]depth[\cdot]
  2. 2
    for each node vv do
  3. 3
    up[v][0]parent[v]up[v][0] \gets parent[v]
    root points to itself
  4. 4
    for k1k \gets 1 to KK do
  5. 5
    for each node vv do
  6. 6
    up[v][k]up[up[v][k1]][k1]up[v][k] \gets up[\,up[v][k-1]\,][k-1]

The table has entries and each costs , so preprocessing is time and space.

doubling identity: (a -edge climb) is two jumps of edges each

-th ancestor in

Any non-negative integer has a unique binary expansion, so the climb of edges decomposes into jumps of size , one jump per set bit. Take each set bit from low to high and follow the matching column of up.

Algorithm:Kth-Ancestor(v,k)\textsc{Kth-Ancestor}(v, k) — jump by each 1-bit of kk
  1. 1
    for j0j \gets 0 to KK do
  2. 2
    if kk has bit jj set then
  3. 3
    vup[v][j]v \gets up[v][j]
  4. 4
    if v=nilv = \text{nil} then return nil
    ran off the root
  5. 5
    return vv

At most bits are set, so this is . The order of the jumps does not matter for the destination (they compose to the same total climb), but processing low bits first keeps the running node well-defined at each step.

splits the -edge climb into a jump then a jump (one per set bit)

LCA in

The LCA query reuses the same jumps as two phases. Phase 1 lifts the deeper node up by exactly , a single call, so and sit at equal depth. If they now coincide, one was an ancestor of the other and we are done. Phase 2 lifts both nodes simultaneously: scanning from high to low, we jump both up by only when that keeps them distinct. When the loop ends, and are the two distinct children-side nodes just below the LCA, so the answer is their common parent.

Algorithm:LCA(u,v)\textsc{LCA}(u, v) — equalize depth, then jump both up greedily
  1. 1
    if depth[u]<depth[v]depth[u] < depth[v] then swap u,vu, v
  2. 2
    uKth-Ancestor(u, depth[u]depth[v])u \gets \textsc{Kth-Ancestor}(u,\ depth[u] - depth[v])
    phase 1
  3. 3
    if u=vu = v then return uu
  4. 4
    for kKk \gets K downto 00 do
    phase 2
  5. 5
    if up[u][k]up[v][k]up[u][k] \ne up[v][k] then
  6. 6
    uup[u][k]u \gets up[u][k]
  7. 7
    vup[v][k]v \gets up[v][k]
  8. 8
    return up[u][0]up[u][0]
    their common parent

The depth-equalizing jump is and the second loop runs times, so each LCA query is after the one-time build.

Lift the deeper node to equal depth, then jump both up by decreasing powers of two; the LCA is highlighted

Here and already share depth; both lift to and (kept distinct), then one parent step lands on , drawn in acc.

A worked table

The whole method lives in the up grid. For the small tree below, rows are nodes and columns are , with each entry the -th ancestor. Column is the parent; every later column is the previous column composed with itself.

The table — -th ancestors, each column doubled from the last; one query's jump cells in acc

Reading row : its st ancestor is (), its nd is , its th is . To find the th ancestor of we write and jump by the and columns — cells highlighted in acc — climbing then edges; on this depth- tree that runs off the root, which reports as nil.

Application: tree distance and path queries

LCA turns a two-vertex path question into arithmetic on depths. The unique path from to in a tree goes up from to and back down to , so its length is

Each query is one LCA plus work, hence . The same decomposition answers is on the path?, aggregates a value along the path (split into the two vertical legs), or, combined with , emits step-by-step U/L/R directions: climb steps up, then walk the recorded downward path to .

Alternatives

Binary lifting is the most broadly useful LCA method, but two others are worth knowing.1

  • Euler tour + sparse-table RMQ. Record the Euler traversal of the tree (each node appended on entry and after each child returns); within it, the LCA of and is the shallowest node visited between any occurrence of and of . That reduces LCA to a range-minimum query over the depth array, which a sparse table answers in after an build.2 So queries drop to , but the structure is static and does not directly give -th ancestors.
    The reduction is best seen laid out. Below the tree, the Euler tour writes each node as it is entered and re-entered, with its depth underneath. The LCA of and is the shallowest entry anywhere between an occurrence of and one of — i.e. the minimum of that depth subarray (shaded), which here is :
Euler tour reduces LCA to range-minimum: between and in the tour, the shallowest (minimum-depth) entry is .
  • Tarjan's offline LCA. If all query pairs are known in advance, a single DFS with a union-find structure answers them in near-linear total time, processing each query when its second endpoint is first reached.3

Takeaways

  • The lowest common ancestor of and is the deepest node ancestral to both; it is unique because ancestor sets are root-chains.
  • The naive walk (equalize depth, climb together) needs no preprocessing but costs per query, or on a degenerate tree.
  • Binary lifting precomputes , the -th ancestor, via the doubling identity in time and space.
  • A -th-ancestor query jumps by each -bit of ; an LCA query lifts the deeper node to equal depth, then jumps both up by decreasing powers of two while they stay distinct — each .
  • Tree distance is , turning path queries into arithmetic.
  • Alternatives: Euler tour + sparse-table RMQ gives queries on a static tree; Tarjan's union-find DFS answers all queries offline in near-linear time. Binary lifting wins on being online and also serving -th ancestors.

Footnotes

  1. Skiena, § — Trees / LCA: survey of LCA strategies and the preprocessing/query trade-off across query models.
  2. Erickson, Ch. — Trees: the Euler-tour reduction of LCA to range-minimum, with sparse-table RMQ giving queries after preprocessing.
  3. CLRS, Ch. — (trees): Tarjan's offline LCA via depth-first search and disjoint-set union, near-linear in .
Practice

╌╌ END ╌╌