Graphs/Topological Sort and Strong Connectivity

Lesson 6.21,658 words

Topological Sort and Strong Connectivity

Directed acyclic graphs model dependencies: tasks that must precede other tasks. A topological order lays such a graph out in a line so every edge points forward, and depth-first finish times hand it to us almost for free.

╌╌╌╌

Many problems are really questions about order. To compile a program you must build each module before the ones that depend on it; to follow a recipe you must chop before you sauté; to finish a degree you must clear the prerequisites of each course. Each of these is a directed acyclic graph, a digraph with no directed cycles, and the task of find a consistent order is topological sorting. Depth-first search, with its finish-time timestamps from the previous lesson, solves it almost incidentally.

Directed acyclic graphs

The absence of cycles is exactly what makes a consistent ordering possible: if task must precede and must precede , no linear order can satisfy both. Here is a small DAG of course prerequisites, where an edge means must come before :

A small DAG of course prerequisites where an edge to means precedes .

Topological order

Picture all the vertices pinned along a horizontal line so that every edge points rightward. The DAG above admits the order , and you can check that each of its six edges goes left to right:

The DAG laid out in topological order with every edge pointing rightward.

Topological orders are usually not unique; works equally well. Two facts tie ordering to acyclicity:

The forward direction is immediate: a directed cycle could never be laid out with all edges pointing the same way. The converse, that every DAG has such an order, is what the algorithm below constructs.

Topological sort via DFS finish times

The key observation, drawn out by all three texts, links acyclicity to depth-first search's edge classification from the previous lesson:1

A back edge runs from to a gray ancestor ; the tree path from down to together with that edge forms a cycle. Conversely, in any cycle the first vertex discovered becomes a gray ancestor of the others, so the edge closing the cycle is a back edge. Since DAGs have no back edges, every edge of a DAG is a tree, forward, or cross edge, and for all three of those, the finish times obey . That single inequality is the whole trick:

So we run DFS, and as each vertex finishes we push it onto the front of a list. When DFS completes, the list reads off a valid topological order.2

Algorithm 1:Topological-Sort(G)\textsc{Topological-Sort}(G) — order a DAG by DFS finish times
  1. 1
    LL \gets empty linked list
  2. 2
    foreach vertex uVu \in V do
  3. 3
    u.colorwhiteu.color \gets \text{white}
  4. 4
    foreach vertex uVu \in V do
  5. 5
    if u.color=whiteu.color = \text{white} then
  6. 6
    call TS-Visit(G,u,L)\textsc{TS-Visit}(G, u, L)
  7. 7
    return LL
Algorithm 2:TS-Visit(G,u,L)\textsc{TS-Visit}(G, u, L) — finish uu, then prepend it to LL
  1. 1
    u.colorgrayu.color \gets \text{gray}
  2. 2
    foreach vv adjacent to uu do
  3. 3
    if v.color=whitev.color = \text{white} then
  4. 4
    call TS-Visit(G,v,L)\textsc{TS-Visit}(G, v, L)
  5. 5
    u.colorblacku.color \gets \text{black}
  6. 6
    prepend uu to the front of LL
    smaller finish goes later

Correctness. Consider any edge . When DFS explores it, is white, gray, or black. It cannot be gray — that would be a back edge, impossible in a DAG. If is white, it becomes a descendant of and finishes first, so . If is black, it already finished, so again . In every case finishes after , so prepending on finish places before in , the defining property of a topological order.

Running time. This is just DFS plus work per vertex to splice it into the list, so it runs in , which is linear.

Concretely, run DFS on the prerequisite DAG from (ties alphabetical) and read off each vertex's finish time . Sorting the vertices by decreasing drops them straight into a valid topological order — and every edge, drawn below the sorted line, points rightward:

DFS finish times on the DAG (left); sorting by decreasing gives the topological order with all edges forward (right).

Why a topological order matters: the evaluation DAG

A problem that looks unrelated brings topological order to life: computing Fibonacci numbers . The naive recursion branches into and , and its call tree is exponential, but most of its nodes are duplicates. If we collapse the identical subproblems into single nodes, the recursion tree becomes a small DAG: one node per value , with an edge whenever computing needs .

The Fibonacci evaluation DAG with one node per value and an edge to each needed subproblem.

To compute we must evaluate every node after the nodes it points to are already known, that is, in a valid topological order of the evaluation DAG. Here the order is read off by level: . Filling an array in that order computes each value with work, turning exponential recursion into a single linear sweep:

Algorithm 3:Dyn-Fibo(n)\textsc{Dyn-Fibo}(n) — evaluate the DAG in topological order
  1. 1
    allocate array F[0..n]F[0 \mathbin{..} n]
  2. 2
    F[0]0F[0] \gets 0; F[1]1F[1] \gets 1
  3. 3
    for i2i \gets 2 to nn do
  4. 4
    F[i]F[i1]+F[i2]F[i] \gets F[i-1] + F[i-2]
    predecessors already done
  5. 5
    return F[n]F[n]

The lesson generalizes: whenever quantities depend on one another acyclically, their dependency digraph is a DAG, and a topological order is exactly a safe order in which to evaluate them, predecessors first. This is the structural backbone of dynamic programming, which we return to later.

Running Kahn's algorithm on the prerequisite DAG, is the only source. Emit it, and removing its edges drops and to in-degree ; emitting then frees , and finally . The result is the same order that DFS finish times produced:

Kahn's algorithm on the prerequisite DAG: each vertex tagged with its in-degree; repeatedly emit an in-degree- vertex, yielding the order .

Strong connectivity

DAGs are the cycle-free case. What can we say about a general digraph, cycles and all? The right notion of connected for directed graphs is mutual reachability.

Strong connectivity partitions into SCCs. Collapsing each component to a single super-vertex yields the component graph (or condensation), and a beautiful fact holds:

If it had a cycle, the components on that cycle could all reach one another and would have been merged into one larger component, contradicting maximality. So every directed graph is, at the coarse level of its components, a DAG. SCCs are the standard first step in analyzing a digraph: find the components, contract them, and reason about the resulting DAG.

A digraph with two strongly connected components and linked by edges from to .

Here form one SCC (each reaches the other) and another, and the only edges between the two groups run from to . Collapsing each component to a super-vertex leaves the two-node condensation, itself a DAG, with its own trivial topological order then :

The two-node condensation DAG with component pointing to component .

Kosaraju's two-pass algorithm

The cleanest way to find SCCs, due to Kosaraju and Sharir, is two depth-first searches with a transpose in between.3 The transpose is with every edge reversed; crucially, it has exactly the same SCCs as , because reversing all edges does not affect mutual reachability.

Algorithm 4:Strongly-Connected-Components(G)\textsc{Strongly-Connected-Components}(G) — Kosaraju's two passes
  1. 1
    call DFS(G)\textsc{DFS}(G) to compute the finish time u.fu.f for each vertex uu
  2. 2
    compute GTG^{\mathsf{T}}
    reverse all edges
  3. 3
    call DFS(GT)\textsc{DFS}(G^{\mathsf{T}}), considering vertices in order of decreasing u.fu.f
  4. 4
    output the vertices of each tree in the second forest as one SCC

Why it works (the intuition). Imagine the component graph laid out in topological order, sources on the left. The first DFS on assigns the largest finish time to a vertex in a source component of that DAG. When we then run DFS on , where every component-graph edge is reversed, and start from that highest-finishing vertex, we are launching from a sink of the reversed component graph. From a sink, the search cannot leak into any other component, so it visits exactly one SCC and stops. Peeling components off in decreasing finish order keeps this true at every step. The whole procedure is two DFS passes plus a transpose, all linear, so SCCs cost .

The picture below makes the source/sink reversal concrete. The first DFS on a graph with three SCCs lands the largest finish time () inside the source component ; reversing the edges turns that source into a sink, so the second DFS, launched from , is trapped inside one SCC and peels it off cleanly:

Kosaraju's idea: the source SCC of holds the highest finish time, so the second DFS on launches from a sink and peels one SCC at a time.

Takeaways

  • A DAG is a directed graph with no cycle; it has a topological order (every edge points forward) if and only if it is acyclic.
  • DFS detects acyclicity by the absence of back edges, and listing vertices in decreasing finish time yields a topological order in .
  • A topological order is exactly a safe evaluation order for acyclically dependent quantities, predecessors first. Collapsing the Fibonacci recursion into its evaluation DAG and sweeping it in topological order turns exponential recursion into a linear pass, the seed of dynamic programming.
  • Strongly connected components are maximal mutually-reachable vertex sets; contracting them always produces a DAG.
  • Kosaraju's two-pass DFS (run DFS, transpose, run DFS in decreasing finish order) finds all SCCs in ; Tarjan's low-link method does it in one pass.

Footnotes

  1. Erickson, Ch. 6 — Depth-First Search — a digraph is acyclic iff DFS finds no back edge.
  2. CLRS, Ch. 22 — Elementary Graph Algorithms — topological sort by decreasing DFS finish time in .
  3. Skiena, §5 — Graph Traversal — finding strongly connected components via two DFS passes.
Practice

╌╌ END ╌╌