Topological Sort and Strong Connectivity
Directed acyclic graphs model dependencies: tasks that must precede other tasks. A topological order lays such a graph out in a line so every edge points forward, and depth-first finish times hand it to us almost for free.
╌╌╌╌
Many problems are really questions about order. To compile a program you must
build each module before the ones that depend on it; to follow a recipe you must
chop before you sauté; to finish a degree you must clear the prerequisites of
each course. Each of these is a directed acyclic graph, a digraph with no
directed cycles, and the task of find a consistent order
is topological
sorting. Depth-first search,
with its finish-time timestamps from the previous lesson, solves it almost
incidentally.
Directed acyclic graphs
The absence of cycles is exactly what makes a consistent ordering possible: if
task must precede and must precede , no linear order can satisfy
both. Here is a small DAG of course prerequisites, where an edge means
must come before
:
Topological order
Picture all the vertices pinned along a horizontal line so that every edge points rightward. The DAG above admits the order , and you can check that each of its six edges goes left to right:
Topological orders are usually not unique; works equally well. Two facts tie ordering to acyclicity:
The forward direction is immediate: a directed cycle could never be laid out with all edges pointing the same way. The converse, that every DAG has such an order, is what the algorithm below constructs.
Topological sort via DFS finish times
The key observation, drawn out by all three texts, links acyclicity to depth-first search's edge classification from the previous lesson:1
A back edge runs from to a gray ancestor ; the tree path from down to together with that edge forms a cycle. Conversely, in any cycle the first vertex discovered becomes a gray ancestor of the others, so the edge closing the cycle is a back edge. Since DAGs have no back edges, every edge of a DAG is a tree, forward, or cross edge, and for all three of those, the finish times obey . That single inequality is the whole trick:
So we run DFS, and as each vertex finishes we push it onto the front of a list. When DFS completes, the list reads off a valid topological order.2
- 1empty linked list
- 2foreach vertex do
- 3
- 4foreach vertex do
- 5if then
- 6call
- 7return
- 1
- 2foreach adjacent to do
- 3if then
- 4call
- 5
- 6prepend to the front ofsmaller finish goes later
Correctness. Consider any edge . When DFS explores it, is white, gray, or black. It cannot be gray — that would be a back edge, impossible in a DAG. If is white, it becomes a descendant of and finishes first, so . If is black, it already finished, so again . In every case finishes after , so prepending on finish places before in , the defining property of a topological order.
Running time. This is just DFS plus work per vertex to splice it into the list, so it runs in , which is linear.
Concretely, run DFS on the prerequisite DAG from (ties alphabetical) and read off each vertex's finish time . Sorting the vertices by decreasing drops them straight into a valid topological order — and every edge, drawn below the sorted line, points rightward:
Why a topological order matters: the evaluation DAG
A problem that looks unrelated brings topological order to life: computing Fibonacci numbers . The naive recursion branches into and , and its call tree is exponential, but most of its nodes are duplicates. If we collapse the identical subproblems into single nodes, the recursion tree becomes a small DAG: one node per value , with an edge whenever computing needs .
To compute we must evaluate every node after the nodes it points to are already known, that is, in a valid topological order of the evaluation DAG. Here the order is read off by level: . Filling an array in that order computes each value with work, turning exponential recursion into a single linear sweep:
- 1allocate array
- 2;
- 3for to do
- 4predecessors already done
- 5return
The lesson generalizes: whenever quantities depend on one another acyclically, their dependency digraph is a DAG, and a topological order is exactly a safe order in which to evaluate them, predecessors first. This is the structural backbone of dynamic programming, which we return to later.
Running Kahn's algorithm on the prerequisite DAG, is the only source. Emit it, and removing its edges drops and to in-degree ; emitting then frees , and finally . The result is the same order that DFS finish times produced:
Strong connectivity
DAGs are the cycle-free case. What can we say about a general digraph, cycles
and all? The right notion of connected
for directed graphs is mutual
reachability.
Strong connectivity partitions into SCCs. Collapsing each component to a single super-vertex yields the component graph (or condensation), and a beautiful fact holds:
If it had a cycle, the components on that cycle could all reach one another and would have been merged into one larger component, contradicting maximality. So every directed graph is, at the coarse level of its components, a DAG. SCCs are the standard first step in analyzing a digraph: find the components, contract them, and reason about the resulting DAG.
Here form one SCC (each reaches the other) and another, and the only edges between the two groups run from to . Collapsing each component to a super-vertex leaves the two-node condensation, itself a DAG, with its own trivial topological order then :
Kosaraju's two-pass algorithm
The cleanest way to find SCCs, due to Kosaraju and Sharir, is two depth-first searches with a transpose in between.3 The transpose is with every edge reversed; crucially, it has exactly the same SCCs as , because reversing all edges does not affect mutual reachability.
- 1call to compute the finish time for each vertex
- 2computereverse all edges
- 3call , considering vertices in order of decreasing
- 4output the vertices of each tree in the second forest as one SCC
Why it works (the intuition). Imagine the component graph laid out in topological order, sources on the left. The first DFS on assigns the largest finish time to a vertex in a source component of that DAG. When we then run DFS on , where every component-graph edge is reversed, and start from that highest-finishing vertex, we are launching from a sink of the reversed component graph. From a sink, the search cannot leak into any other component, so it visits exactly one SCC and stops. Peeling components off in decreasing finish order keeps this true at every step. The whole procedure is two DFS passes plus a transpose, all linear, so SCCs cost .
The picture below makes the source/sink reversal concrete. The first DFS on a graph with three SCCs lands the largest finish time () inside the source component ; reversing the edges turns that source into a sink, so the second DFS, launched from , is trapped inside one SCC and peels it off cleanly:
Takeaways
- A DAG is a directed graph with no cycle; it has a topological order (every edge points forward) if and only if it is acyclic.
- DFS detects acyclicity by the absence of back edges, and listing vertices in decreasing finish time yields a topological order in .
- A topological order is exactly a safe evaluation order for acyclically dependent quantities, predecessors first. Collapsing the Fibonacci recursion into its evaluation DAG and sweeping it in topological order turns exponential recursion into a linear pass, the seed of dynamic programming.
- Strongly connected components are maximal mutually-reachable vertex sets; contracting them always produces a DAG.
- Kosaraju's two-pass DFS (run DFS, transpose, run DFS in decreasing finish order) finds all SCCs in ; Tarjan's low-link method does it in one pass.
Footnotes
╌╌ END ╌╌