Topological Sort and Strong Connectivity

Many problems are really questions about order. To compile a program you must build each module before the ones that depend on it; to follow a recipe you must chop before you sauté; to finish a degree you must clear the prerequisites of each course. Each of these is a directed acyclic graph, a digraph with no directed cycles, and the task of find a consistent order is topological sorting. Depth-first search, with its finish-time timestamps from the previous lesson, solves it almost incidentally.

Directed acyclic graphs

The absence of cycles is what makes a consistent ordering possible: if task $a$ must precede $b$ and $b$ must precede $a$ , no linear order can satisfy both. Here is a small DAG of course prerequisites, where an edge $u \to v$ means $u$ must come before $v$ :

A small DAG of course prerequisites where an edge

u

v

means

u

precedes

v

Topological order

Picture all the vertices pinned along a horizontal line so that every edge points rightward. The DAG above admits the order $a, b, d, e, c$ ; each of its six edges goes left to right:

The DAG laid out in topological order

a, b, d, e, c

with every edge pointing rightward.

Topological orders are usually not unique; $a, d, b, e, c$ works equally well. Two facts tie ordering to acyclicity:

Topological sort via DFS finish times

All three texts draw out the same observation, linking acyclicity to depth-first search's edge classification from the previous lesson:¹

Since DAGs have no back edges, every edge of a DAG is a tree, forward, or cross edge, and for all three of those, the finish times obey $u . f > v . f$ . That single inequality gives the algorithm:

So we run DFS, and as each vertex finishes we push it onto the front of a list. When DFS completes, the list reads off a valid topological order.²

Algorithm 1:

\textsc{Topological-Sort}(G)

— order a DAG by DFS finish times

1
$L \gets$ empty linked list
2
foreach vertex $u \in V$ do
3
$u.color \gets \text{white}$
4
foreach vertex $u \in V$ do
5
if $u.color = \text{white}$ then
6
call $\textsc{TS-Visit}(G, u, L)$
7
return $L$

Algorithm 2:

\textsc{TS-Visit}(G, u, L)

— finish

u

, then prepend it to

L

1
$u.color \gets \text{gray}$
2
foreach $v$ adjacent to $u$ do
3
if $v.color = \text{white}$ then
4
call $\textsc{TS-Visit}(G, v, L)$
5
$u.color \gets \text{black}$
6
prepend $u$ to the front of $L$
smaller finish goes later

Running time. This is just DFS plus $O (1)$ work per vertex to splice it into the list, so it runs in $Θ (V + E)$ , which is linear.

topological_sort.pypython

from collections.abc import Hashable
from enum import Enum
from typing import Optional, TypeVar

from graph import Graph, Vertex

Label = TypeVar("Label", bound=Hashable)

class Color(Enum):
  """
    DFS vertex states: undiscovered, on the recursion stack, finished.\n
  """
  WHITE = 0
  GRAY = 1
  BLACK = 2

class CycleError(Exception):
  """
    Raised when a directed cycle is found, so no topological order exists.\n
  """

def topological_sort(graph: Graph[Label]) -> list[Label]:
  """
    A topological order of `graph`'s vertex labels: for every edge\n
    (source -> target), source precedes target in the result.\n
    Raises CycleError if the digraph contains a directed cycle.\n
  """
  # every vertex starts undiscovered.
  color: dict[Label, Color] = {
    vertex.label: Color.WHITE for vertex in graph.vertices
  }
  order: list[Label] = []

  def visit(vertex: Vertex[Label]) -> None:
    color[vertex.label] = Color.GRAY

    for neighbor in vertex.neighbors():
      state: Color = color[neighbor.label]

      # an edge into a vertex still on the stack closes a cycle.
      if state is Color.GRAY:
        raise CycleError(f"cycle through {neighbor.label!r}")
      if state is Color.WHITE:
        visit(neighbor)

    # record on finish; later reversal yields the topological order.
    color[vertex.label] = Color.BLACK
    order.append(vertex.label)

  # launch DFS from each still-undiscovered vertex.
  for vertex in graph.vertices:
    if color[vertex.label] is Color.WHITE:
      visit(vertex)

  # append-on-finish then reverse equals prepend-on-finish.
  order.reverse()
  return order

def is_acyclic(graph: Graph[Label]) -> bool:
  """
    Whether `graph` is a DAG — true exactly when DFS finds no back edge.\n
  """
  try:
    topological_sort(graph)
  except CycleError:
    return False
  return True

def find_cycle(graph: Graph[Label]) -> Optional[list[Label]]:
  """
    One directed cycle as a list of labels (the repeated vertex closes it),\n
    or None when the digraph is acyclic.\n
  """
  # every vertex starts undiscovered; the stack is the active DFS path.
  color: dict[Label, Color] = {
    vertex.label: Color.WHITE for vertex in graph.vertices
  }
  stack: list[Label] = []

  def visit(vertex: Vertex[Label]) -> Optional[list[Label]]:
    color[vertex.label] = Color.GRAY
    stack.append(vertex.label)

    for neighbor in vertex.neighbors():
      state: Color = color[neighbor.label]

      # a gray neighbor is an ancestor: slice the path to close the loop.
      if state is Color.GRAY:
        start: int = stack.index(neighbor.label)
        return stack[start:] + [neighbor.label]

      # recurse into unseen neighbors, bubbling up any cycle found.
      if state is Color.WHITE:
        cycle: Optional[list[Label]] = visit(neighbor)
        if cycle is not None:
          return cycle

    # leave the path once exhausted.
    color[vertex.label] = Color.BLACK
    stack.pop()
    return None

  # search from each undiscovered vertex until a cycle turns up.
  for vertex in graph.vertices:
    if color[vertex.label] is Color.WHITE:
      cycle = visit(vertex)
      if cycle is not None:
        return cycle
  return None

graph.pypython

from collections.abc import Hashable, Iterator
from typing import Generic, Optional, TypeVar


Label = TypeVar("Label", bound=Hashable)


class Edge(Generic[Label]):
  """
    A directed connection from `source` to `target`, carrying a weight.\n
  """

  def __init__(
    self,
    source: Vertex[Label],
    target: Vertex[Label],
    weight: float = 1.0,
  ) -> None:
    self.source: Vertex[Label] = source
    self.target: Vertex[Label] = target
    self.weight: float = weight

  def __repr__(self) -> str:
    return f"Edge({self.source.label!r} -> {self.target.label!r}, w={self.weight})"


class Vertex(Generic[Label]):
  """
    A graph vertex: a label plus the list of edges leaving it.\n
  """

  def __init__(self, label: Label) -> None:
    self.label: Label = label
    self.outgoing: list[Edge[Label]] = []

  def neighbors(self) -> list[Vertex[Label]]:
    """
      The vertices reachable from this one by a single edge.\n
    """
    return [edge.target for edge in self.outgoing]

  def edge_to(self, label: Label) -> Optional[Edge[Label]]:
    """
      The outgoing edge to the vertex with `label`, or None.\n
    """
    for edge in self.outgoing:
      if edge.target.label == label:
        return edge
    return None

  def __repr__(self) -> str:
    return f"Vertex({self.label!r})"


class Graph(Generic[Label]):
  """
    A graph of Vertex objects linked by Edge objects.\n
    Pass `directed=True` for a digraph; otherwise each `add_edge` inserts\n
    the reverse edge too.\n
  """

  def __init__(self, directed: bool = False) -> None:
    self.directed: bool = directed
    self._vertices: dict[Label, Vertex[Label]] = {}

  def add_vertex(self, label: Label) -> Vertex[Label]:
    """
      Return the vertex for `label`, creating it if it is absent.\n
    """
    # reuse the existing vertex, or mint and register a fresh one.
    vertex = self._vertices.get(label)
    if vertex is None:
      vertex = Vertex(label)
      self._vertices[label] = vertex
    return vertex

  def add_edge(
    self,
    source_label: Label,
    target_label: Label,
    weight: float = 1.0,
  ) -> None:
    """
      Connect two labels (creating either vertex as needed).\n
      Adds the reverse edge as well when the graph is undirected.\n
    """
    source = self.add_vertex(source_label)
    target = self.add_vertex(target_label)

    # link source to target, and mirror it back when undirected.
    source.outgoing.append(Edge(source, target, weight))
    if not self.directed:
      target.outgoing.append(Edge(target, source, weight))

  def vertex(self, label: Label) -> Vertex[Label]:
    """
      The vertex carrying `label` (raises KeyError if absent).\n
    """
    return self._vertices[label]

  @property
  def vertices(self) -> list[Vertex[Label]]:
    """
      Every vertex, in insertion order.\n
    """
    return list(self._vertices.values())

  @property
  def labels(self) -> list[Label]:
    """
      Every vertex label, in insertion order.\n
    """
    return list(self._vertices)

  def edges(self) -> Iterator[Edge[Label]]:
    """
      Each edge once — an undirected edge is yielded a single time.\n
    """
    # track undirected endpoint pairs so each is emitted only once.
    seen: set[frozenset[Label]] = set()

    for vertex in self._vertices.values():
      for edge in vertex.outgoing:
        # skip an undirected edge already yielded from the other endpoint.
        if not self.directed:
          endpoints = frozenset((edge.source.label, edge.target.label))
          if endpoints in seen:
            continue
          seen.add(endpoints)

        yield edge

  def __contains__(self, label: Label) -> bool:
    return label in self._vertices

  def __iter__(self) -> Iterator[Vertex[Label]]:
    return iter(self._vertices.values())

  def __len__(self) -> int:
    return len(self._vertices)

A full trace on the prerequisite DAG

Run the algorithm on the prerequisite DAG, starting at $a$ and taking $a$ 's adjacency list in the order $d, b$ (adjacency-list order is arbitrary; this one keeps the trace short). Every discovery and finish ticks the same clock, so the run is a sequence of ten timestamped events:

time	event	call stack	$L$ afterwards
1	discover $a$	$a$	$⟨ ⟩$
2	discover $d$	$a, d$	$⟨ ⟩$
3	discover $e$	$a, d, e$	$⟨ ⟩$
4	discover $c$	$a, d, e, c$	$⟨ ⟩$
5	finish $c$	$a, d, e$	$⟨ c ⟩$
6	finish $e$	$a, d$	$⟨ e, c ⟩$
7	finish $d$	$a$	$⟨ d, e, c ⟩$
8	discover $b$	$a, b$	$⟨ d, e, c ⟩$
9	finish $b$	$a$	$⟨ b, d, e, c ⟩$
10	finish $a$	—	$⟨ a, b, d, e, c ⟩$

At time 8, $b$ inspects its neighbors $c$ and $e$ , finds both black (finished), and finishes immediately — those two edges become cross edges, and the inequality $b . f > c . f$ , $b . f > e . f$ holds for them just as it does for tree edges. The final list $⟨ a, b, d, e, c ⟩$ is the topological order, and sorting the vertices by decreasing $f$ reproduces it exactly. Every edge, drawn below the sorted line, points rightward:

DFS finish times on the DAG (top); sorting by decreasing

f

gives the topological order

a, b, d, e, c

with all edges forward (bottom).

Why a topological order matters: the evaluation DAG

For example, take computing Fibonacci numbers $F_{n} = F_{n - 1} + F_{n - 2}$ . The naive recursion $Fibo (n)$ branches into $Fibo (n - 1)$ and $Fibo (n - 2)$ , and its call tree is exponential, but most of its nodes are duplicates. If we collapse the identical subproblems into single nodes, the recursion tree becomes a small DAG: one node per value $F_{i}$ , with an edge $F_{i} \to F_{j}$ whenever computing $F_{i}$ needs $F_{j}$ .

The Fibonacci evaluation DAG with one node per value and an edge to each needed subproblem.

To compute $F_{n}$ we must evaluate every node after the nodes it points to are already known, that is, in a valid topological order of the evaluation DAG. Here the order is read off by level: $F_{0}, F_{1}, F_{2}, \dots, F_{n}$ . Filling an array in that order computes each value with $O (1)$ work, turning exponential recursion into a single linear sweep:

Algorithm 3:

\textsc{Dyn-Fibo}(n)

— evaluate the DAG in topological order

1
allocate array $F[0 \mathbin{..} n]$
2
$F[0] \gets 0$ ; $F[1] \gets 1$
3
for $i \gets 2$ to $n$ do
4
$F[i] \gets F[i-1] + F[i-2]$
predecessors already done
5
return $F[n]$

The lesson generalizes: whenever quantities depend on one another acyclically, their dependency digraph is a DAG, and a topological order gives a safe order in which to evaluate them, predecessors first. This is the structure underlying dynamic programming, which we return to later.

Kahn's algorithm: peeling sources

Skiena presents the equivalent Kahn's algorithm, which never mentions finish times.³ The idea is an induction on sources. A DAG always has at least one vertex of in-degree $0$ (follow edges backward from any vertex; with no cycle the walk must stop, and it stops at a source). Any source can safely go first in the order, and deleting it leaves a smaller DAG, so repeat.

Algorithm 4:

\textsc{Kahn-Topological-Sort}(G)

— repeatedly emit a source

1
compute $indeg[v]$ for every $v \in V$
one pass over all adjacency lists
2
$Q \gets$ queue of all vertices with $indeg[v] = 0$
3
$L \gets$ empty list
4
while $Q$ is nonempty do
5
$u \gets$ dequeue $Q$ ; append $u$ to $L$
6
foreach $v$ adjacent to $u$ do
7
$indeg[v] \gets indeg[v] - 1$
delete $u$ 's out-edges
8
if $indeg[v] = 0$ then enqueue $v$
9
if $|L| < |V|$ then report "cycle" else return $L$

The deletion is virtual: decrementing $in d e g [v]$ stands in for removing the edge $(u, v)$ . On the prerequisite DAG the in-degrees start at $a : 0$ , $b : 1$ , $c : 2$ , $d : 1$ , $e : 2$ , and the run proceeds (breaking queue ties alphabetically):

step	emit	decrements	in-degrees left ( $b, c, d, e$ )	queue after
1	$a$	$b \to 0$ , $d \to 0$	$0, 2, 0, 2$	$b, d$
2	$b$	$c \to 1$ , $e \to 1$	$-, 1, 0, 1$	$d$
3	$d$	$e \to 0$	$-, 1, -, 0$	$e$
4	$e$	$c \to 0$	$-, 0, -, -$	$c$
5	$c$	—	done	empty

The result, $a, b, d, e, c$ , happens to match the DFS order; with a different tie-break ( $d$ before $b$ at step 1) it would produce the equally valid $a, d, b, e, c$ . Two properties fall out of the loop structure:

Cycle detection is free. If the queue empties while vertices remain, every leftover vertex has in-degree $\geq 1$ among the leftovers, and following in-edges backward inside that set forever must revisit a vertex: the leftovers contain a cycle. So $∣ L ∣ < ∣ V ∣$ if and only if $G$ is not a DAG — Kahn's algorithm doubles as a cycle detector.
Counting the orders. Whenever the queue holds $k$ vertices, any of the $k$ may go next; the algorithm enumerates one topological order per tie-break policy, and swapping the queue for a priority queue produces the lexicographically smallest order at $O (E + V log V)$ cost.

Running time. Computing all in-degrees touches every edge once, $Θ (V + E)$ . Each vertex is enqueued and dequeued at most once ( $Θ (V)$ ), and each edge $(u, v)$ triggers exactly one decrement, when $u$ is emitted ( $Θ (E)$ ). Total: $Θ (V + E)$ , matching the DFS method.

kahn_topological_sort.pypython

from collections import deque
from collections.abc import Hashable
from typing import Optional, TypeVar

from graph import Graph

Label = TypeVar("Label", bound=Hashable)

class CycleError(Exception):
  """
    Raised when leftover vertices prove the digraph is not a DAG.\n
  """

def kahn_topological_sort(graph: Graph[Label]) -> list[Label]:
  """
    A topological order of `graph`'s labels via in-degree peeling.\n
    Raises CycleError if the digraph contains a directed cycle.\n
  """
  # tally how many edges point into each vertex.
  in_degree: dict[Label, int] = {
    vertex.label: 0 for vertex in graph.vertices
  }
  for vertex in graph.vertices:
    for neighbor in vertex.neighbors():
      in_degree[neighbor.label] += 1

  # every initial source seeds the queue.
  ready: deque[Label] = deque(
    label for label, degree in in_degree.items() if degree == 0
  )
  order: list[Label] = []

  # emit a source, then decrement neighbors to expose the next sources.
  while ready:
    label: Label = ready.popleft()
    order.append(label)
    for neighbor in graph.vertex(label).neighbors():
      in_degree[neighbor.label] -= 1
      if in_degree[neighbor.label] == 0:
        ready.append(neighbor.label)

  # leftover vertices never lost their in-edges, so a cycle remains.
  if len(order) != len(graph):
    raise CycleError("graph has a directed cycle")
  return order

def has_cycle(graph: Graph[Label]) -> bool:
  """
    Whether `graph` contains a directed cycle, decided by Kahn's peeling.\n
  """
  return try_topological_sort(graph) is None

def try_topological_sort(graph: Graph[Label]) -> Optional[list[Label]]:
  """
    The topological order, or None when the digraph is cyclic.\n
  """
  try:
    return kahn_topological_sort(graph)
  except CycleError:
    return None

graph.pypython

from collections.abc import Hashable, Iterator
from typing import Generic, Optional, TypeVar


Label = TypeVar("Label", bound=Hashable)


class Edge(Generic[Label]):
  """
    A directed connection from `source` to `target`, carrying a weight.\n
  """

  def __init__(
    self,
    source: Vertex[Label],
    target: Vertex[Label],
    weight: float = 1.0,
  ) -> None:
    self.source: Vertex[Label] = source
    self.target: Vertex[Label] = target
    self.weight: float = weight

  def __repr__(self) -> str:
    return f"Edge({self.source.label!r} -> {self.target.label!r}, w={self.weight})"


class Vertex(Generic[Label]):
  """
    A graph vertex: a label plus the list of edges leaving it.\n
  """

  def __init__(self, label: Label) -> None:
    self.label: Label = label
    self.outgoing: list[Edge[Label]] = []

  def neighbors(self) -> list[Vertex[Label]]:
    """
      The vertices reachable from this one by a single edge.\n
    """
    return [edge.target for edge in self.outgoing]

  def edge_to(self, label: Label) -> Optional[Edge[Label]]:
    """
      The outgoing edge to the vertex with `label`, or None.\n
    """
    for edge in self.outgoing:
      if edge.target.label == label:
        return edge
    return None

  def __repr__(self) -> str:
    return f"Vertex({self.label!r})"


class Graph(Generic[Label]):
  """
    A graph of Vertex objects linked by Edge objects.\n
    Pass `directed=True` for a digraph; otherwise each `add_edge` inserts\n
    the reverse edge too.\n
  """

  def __init__(self, directed: bool = False) -> None:
    self.directed: bool = directed
    self._vertices: dict[Label, Vertex[Label]] = {}

  def add_vertex(self, label: Label) -> Vertex[Label]:
    """
      Return the vertex for `label`, creating it if it is absent.\n
    """
    # reuse the existing vertex, or mint and register a fresh one.
    vertex = self._vertices.get(label)
    if vertex is None:
      vertex = Vertex(label)
      self._vertices[label] = vertex
    return vertex

  def add_edge(
    self,
    source_label: Label,
    target_label: Label,
    weight: float = 1.0,
  ) -> None:
    """
      Connect two labels (creating either vertex as needed).\n
      Adds the reverse edge as well when the graph is undirected.\n
    """
    source = self.add_vertex(source_label)
    target = self.add_vertex(target_label)

    # link source to target, and mirror it back when undirected.
    source.outgoing.append(Edge(source, target, weight))
    if not self.directed:
      target.outgoing.append(Edge(target, source, weight))

  def vertex(self, label: Label) -> Vertex[Label]:
    """
      The vertex carrying `label` (raises KeyError if absent).\n
    """
    return self._vertices[label]

  @property
  def vertices(self) -> list[Vertex[Label]]:
    """
      Every vertex, in insertion order.\n
    """
    return list(self._vertices.values())

  @property
  def labels(self) -> list[Label]:
    """
      Every vertex label, in insertion order.\n
    """
    return list(self._vertices)

  def edges(self) -> Iterator[Edge[Label]]:
    """
      Each edge once — an undirected edge is yielded a single time.\n
    """
    # track undirected endpoint pairs so each is emitted only once.
    seen: set[frozenset[Label]] = set()

    for vertex in self._vertices.values():
      for edge in vertex.outgoing:
        # skip an undirected edge already yielded from the other endpoint.
        if not self.directed:
          endpoints = frozenset((edge.source.label, edge.target.label))
          if endpoints in seen:
            continue
          seen.add(endpoints)

        yield edge

  def __contains__(self, label: Label) -> bool:
    return label in self._vertices

  def __iter__(self) -> Iterator[Vertex[Label]]:
    return iter(self._vertices.values())

  def __len__(self) -> int:
    return len(self._vertices)

Kahn's algorithm on the prerequisite DAG: each vertex tagged with its in-degree; repeatedly emit an in-degree-

0

vertex, yielding the order

⟨ a, b, d, e, c ⟩

Strong connectivity

DAGs are the cycle-free case. What can we say about a general digraph, cycles and all? The right notion of connected for directed graphs is mutual reachability.

Strong connectivity partitions $V$ into SCCs. Collapsing each component to a single super-vertex yields the component graph (or condensation), and the following holds:

So every directed graph is, at the coarse level of its components, a DAG. SCCs are the standard first step in analyzing a digraph: find the components, contract them, and reason about the resulting DAG.

A digraph with two strongly connected components

a, b

and

c, d

linked by edges from

a, b

c, d

Here ${a, b}$ form one SCC (each reaches the other) and ${c, d}$ another, and the only edges between the two groups run from ${a, b}$ to ${c, d}$ . Collapsing each component to a super-vertex leaves the two-node condensation, itself a DAG, with its own trivial topological order ${a, b}$ then ${c, d}$ :

The two-node condensation DAG with component

a, b

pointing to component

c, d

Kosaraju's two-pass algorithm

The cleanest way to find SCCs, due to Kosaraju and Sharir, is two depth-first searches with a transpose in between.³ The transpose $G^{T}$ is $G$ with every edge reversed. It has exactly the same SCCs as $G$ : a round trip $u \to v \to u$ in $G$ becomes the round trip $u \to v \to u$ in $G^{T}$ traversed the other way, so mutual reachability is untouched.

Algorithm 5:

\textsc{Strongly-Connected-Components}(G)

— Kosaraju's two passes

1
call $\textsc{DFS}(G)$ to compute the finish time $u.f$ for each vertex $u$
2
compute $G^{\mathsf{T}}$
reverse all edges
3
call $\textsc{DFS}(G^{\mathsf{T}})$ , considering vertices in order of decreasing $u.f$
4
output the vertices of each tree in the second forest as one SCC

Why it works (the intuition). Imagine the component graph laid out in topological order, sources on the left. The first DFS on $G$ assigns the largest finish time to a vertex in a source component of that DAG. When we then run DFS on $G^{T}$ , where every component-graph edge is reversed, and start from that highest-finishing vertex, we are launching from a sink of the reversed component graph. From a sink, the search cannot leak into any other component, so it visits exactly one SCC and stops. Peeling components off in decreasing finish order keeps this true at every step. The whole procedure is two DFS passes plus a transpose, all linear, so SCCs cost $Θ (V + E)$ .

The picture below shows the source/sink reversal. The first DFS on a graph with three SCCs lands the largest finish time ( $12$ ) inside the source component ${a, b}$ ; reversing the edges turns that source into a sink, so the second DFS, launched from $a$ , is trapped inside one SCC and peels it off cleanly:

Kosaraju's idea: the source SCC of

G

holds the highest finish time, so the second DFS on

G^{T}

launches from a sink and peels one SCC at a time.

The intuition hardens into two short proofs. Write $f (C) = max_{u \in C} u . f$ for the largest first-pass finish time inside component $C$ .

The lemma says the first pass computes, for free, a reverse topological order of the condensation: listing components by decreasing $f (C)$ lists them source to sink. The second pass exploits it.

Proof. Induct on the trees in the order the second pass grows them. Suppose every earlier tree spanned a whole component, and let $r$ be the next root: the unvisited vertex with the largest finish time, living in component $C$ . The search from $r$ in $G^{T}$ reaches all of $C$ , since an SCC is strongly connected in both directions and no vertex of $C$ was visited earlier (earlier trees are whole components other than $C$ ). It reaches nothing else: an edge of $G^{T}$ leaving $C$ points to a component $C^{''}$ that has an edge into $C$ in $G$ , so the lemma gives $f (C^{''}) > f (C) \geq r . f$ . The vertex realizing $f (C^{''})$ finished later than every unvisited vertex, so it was already consumed by an earlier tree, and by induction all of $C^{''}$ went with it. The search from $r$ therefore stops at the border of $C$ , spanning exactly $C$ . $□$

A complete run

Here is the full machinery on an eight-vertex digraph adapted from CLRS's worked example.² Its components are $C_{1} = {a, b, e}$ , $C_{2} = {c, d}$ , $C_{3} = {f, g}$ , and $C_{4} = {h}$ . Pass 1 runs DFS on $G$ from $a$ with alphabetical adjacency lists and records finish times: $a$ starts a tree at time $1$ and the exploration order is $a, b, c, d, h, g, f, e$ , giving the finish times shown below.

Pass 1 on

G

: DFS finish times

f

, with the four SCCs boxed. The largest

f

in each component decreases along the condensation:

16 > 12 > 11 > 6

Reading the vertices by decreasing finish time gives the processing order for pass 2:

a (16), b (15), e (14), c (12), g (11), f (10), d (7), h (6) .

Pass 2 reverses every edge and launches DFS roots in that order. Each root grows a tree, and each tree is one SCC:

root	reason it starts a tree	tree grown in $G^{T}$	SCC found
$a$	largest $f$ overall	$a \to e$ , $e \to b$	${a, b, e}$
$c$	largest $f$ still unvisited ( $12$ )	$c \to d$	${c, d}$
$g$	next unvisited ( $11$ )	$g \to f$	${f, g}$
$h$	next unvisited ( $6$ )	(no unvisited neighbor)	${h}$

Each tree halts at the border of its own component. Every reversed edge that leaves a tree — $c \to b$ from the second, $f \to b$ and $f \to e$ from the third, $h \to d$ and $h \to g$ from the last — lands on an already-visited vertex, because it exits toward a component with larger $f (C)$ , which the decreasing- $f$ schedule has already peeled. The first tree needs no such luck: a source component has no incoming edges in $G$ , hence no outgoing edges in $G^{T}$ , so the search from $a$ is walled in from the start:

Pass 2 on

G^{T}

(all edges reversed): roots taken by decreasing

f

grow the four trees (thick blue tree edges); every other reversed edge leads to an already-visited component.

Contracting the four components produces the condensation, and the maxima $f (C_{1}) = 16 > f (C_{2}) = 12 > f (C_{3}) = 11 > f (C_{4}) = 6$ read off a topological order of it, as the lemma guarantees:

The condensation of the eight-vertex digraph. Decreasing component maxima

f (C)

—

16, 12, 11, 6

— give its topological order.

Running time, in full. Pass 1 is one DFS: $Θ (V + E)$ . No sorting is needed to order vertices by decreasing finish time — push each vertex onto a stack as it finishes and pop the stack in pass 2, $Θ (V)$ . Building $G^{T}$ is one scan of the adjacency lists: for each $u$ and each $v \in A d j [u]$ , append $u$ to $A d j^{T} [v]$ , which is $Θ (V + E)$ . Pass 2 is another DFS, $Θ (V + E)$ . The sum is three linear passes plus a stack:

Θ (V + E) + Θ (V) + Θ (V + E) + Θ (V + E) = Θ (V + E) .

kosaraju_scc.pypython

from collections.abc import Hashable
from typing import TypeVar

from graph import Graph, Vertex

Label = TypeVar("Label", bound=Hashable)

def _finish_order(graph: Graph[Label]) -> list[Label]:
  """
    Vertex labels in decreasing DFS finish time over `graph`.\n
    Implemented with an explicit stack so deep graphs do not overflow.\n
  """
  visited: set[Label] = set()
  order: list[Label] = []

  for root in graph.vertices:
    if root.label in visited:
      continue

    # each stack frame is (vertex, index of next neighbor to explore).
    stack: list[tuple[Vertex[Label], int]] = [(root, 0)]
    visited.add(root.label)

    while stack:
      vertex, index = stack[-1]
      neighbors = vertex.neighbors()

      # all neighbors explored: this vertex finishes now.
      if index >= len(neighbors):
        order.append(vertex.label)
        stack.pop()
        continue

      # advance the cursor, then descend into the next unseen neighbor.
      stack[-1] = (vertex, index + 1)
      neighbor = neighbors[index]
      if neighbor.label not in visited:
        visited.add(neighbor.label)
        stack.append((neighbor, 0))

  # finish times ascend as appended, so reverse for decreasing order.
  order.reverse()
  return order

def _transpose(graph: Graph[Label]) -> Graph[Label]:
  """
    A new digraph with every edge of `graph` reversed.\n
  """
  # copy every vertex, then re-add each edge with its endpoints swapped.
  reversed_graph: Graph[Label] = Graph(directed=True)
  for vertex in graph.vertices:
    reversed_graph.add_vertex(vertex.label)

  for vertex in graph.vertices:
    for edge in vertex.outgoing:
      reversed_graph.add_edge(edge.target.label, edge.source.label, edge.weight)
  return reversed_graph

def strongly_connected_components(graph: Graph[Label]) -> list[list[Label]]:
  """
    The strongly connected components of `graph`, each a list of labels.\n
    Components come out in a topological order of the condensation: a\n
    source component (reaching the others) precedes the ones it reaches.\n
  """
  order: list[Label] = _finish_order(graph)
  transpose: Graph[Label] = _transpose(graph)
  visited: set[Label] = set()
  components: list[list[Label]] = []

  for label in order:
    if label in visited:
      continue

    # collect the whole transpose-reachable tree from this sink vertex.
    component: list[Label] = []
    stack: list[Vertex[Label]] = [transpose.vertex(label)]
    visited.add(label)

    while stack:
      vertex = stack.pop()
      component.append(vertex.label)
      for neighbor in vertex.neighbors():
        if neighbor.label not in visited:
          visited.add(neighbor.label)
          stack.append(neighbor)
    components.append(component)
  return components

def condensation(graph: Graph[Label]) -> Graph[int]:
  """
    The component graph: one super-vertex per SCC (its index in the\n
    `strongly_connected_components` list), with an edge between two\n
    components whenever `graph` has an edge between their members. The\n
    result is always a DAG.\n
  """
  # map each label to the index of its component.
  components: list[list[Label]] = strongly_connected_components(graph)
  component_of: dict[Label, int] = {}
  for index, members in enumerate(components):
    for label in members:
      component_of[label] = index

  # one super-vertex per component.
  condensed: Graph[int] = Graph(directed=True)
  for index in range(len(components)):
    condensed.add_vertex(index)

  # add one edge per distinct cross-component pair.
  seen: set[tuple[int, int]] = set()
  for vertex in graph.vertices:
    source_component: int = component_of[vertex.label]
    for neighbor in vertex.neighbors():
      target_component: int = component_of[neighbor.label]
      if source_component == target_component:
        continue

      pair: tuple[int, int] = (source_component, target_component)
      if pair not in seen:
        seen.add(pair)
        condensed.add_edge(source_component, target_component)
  return condensed

graph.pypython

from collections.abc import Hashable, Iterator
from typing import Generic, Optional, TypeVar


Label = TypeVar("Label", bound=Hashable)


class Edge(Generic[Label]):
  """
    A directed connection from `source` to `target`, carrying a weight.\n
  """

  def __init__(
    self,
    source: Vertex[Label],
    target: Vertex[Label],
    weight: float = 1.0,
  ) -> None:
    self.source: Vertex[Label] = source
    self.target: Vertex[Label] = target
    self.weight: float = weight

  def __repr__(self) -> str:
    return f"Edge({self.source.label!r} -> {self.target.label!r}, w={self.weight})"


class Vertex(Generic[Label]):
  """
    A graph vertex: a label plus the list of edges leaving it.\n
  """

  def __init__(self, label: Label) -> None:
    self.label: Label = label
    self.outgoing: list[Edge[Label]] = []

  def neighbors(self) -> list[Vertex[Label]]:
    """
      The vertices reachable from this one by a single edge.\n
    """
    return [edge.target for edge in self.outgoing]

  def edge_to(self, label: Label) -> Optional[Edge[Label]]:
    """
      The outgoing edge to the vertex with `label`, or None.\n
    """
    for edge in self.outgoing:
      if edge.target.label == label:
        return edge
    return None

  def __repr__(self) -> str:
    return f"Vertex({self.label!r})"


class Graph(Generic[Label]):
  """
    A graph of Vertex objects linked by Edge objects.\n
    Pass `directed=True` for a digraph; otherwise each `add_edge` inserts\n
    the reverse edge too.\n
  """

  def __init__(self, directed: bool = False) -> None:
    self.directed: bool = directed
    self._vertices: dict[Label, Vertex[Label]] = {}

  def add_vertex(self, label: Label) -> Vertex[Label]:
    """
      Return the vertex for `label`, creating it if it is absent.\n
    """
    # reuse the existing vertex, or mint and register a fresh one.
    vertex = self._vertices.get(label)
    if vertex is None:
      vertex = Vertex(label)
      self._vertices[label] = vertex
    return vertex

  def add_edge(
    self,
    source_label: Label,
    target_label: Label,
    weight: float = 1.0,
  ) -> None:
    """
      Connect two labels (creating either vertex as needed).\n
      Adds the reverse edge as well when the graph is undirected.\n
    """
    source = self.add_vertex(source_label)
    target = self.add_vertex(target_label)

    # link source to target, and mirror it back when undirected.
    source.outgoing.append(Edge(source, target, weight))
    if not self.directed:
      target.outgoing.append(Edge(target, source, weight))

  def vertex(self, label: Label) -> Vertex[Label]:
    """
      The vertex carrying `label` (raises KeyError if absent).\n
    """
    return self._vertices[label]

  @property
  def vertices(self) -> list[Vertex[Label]]:
    """
      Every vertex, in insertion order.\n
    """
    return list(self._vertices.values())

  @property
  def labels(self) -> list[Label]:
    """
      Every vertex label, in insertion order.\n
    """
    return list(self._vertices)

  def edges(self) -> Iterator[Edge[Label]]:
    """
      Each edge once — an undirected edge is yielded a single time.\n
    """
    # track undirected endpoint pairs so each is emitted only once.
    seen: set[frozenset[Label]] = set()

    for vertex in self._vertices.values():
      for edge in vertex.outgoing:
        # skip an undirected edge already yielded from the other endpoint.
        if not self.directed:
          endpoints = frozenset((edge.source.label, edge.target.label))
          if endpoints in seen:
            continue
          seen.add(endpoints)

        yield edge

  def __contains__(self, label: Label) -> bool:
    return label in self._vertices

  def __iter__(self) -> Iterator[Vertex[Label]]:
    return iter(self._vertices.values())

  def __len__(self) -> int:
    return len(self._vertices)

Common pitfalls

Sorting by discovery time instead of finish time. The two are not interchangeable. On the three-vertex DAG with edges $a \to b$ , $a \to c$ , $c \to b$ , a DFS from $a$ that tries $b$ first discovers vertices in the order $a, b, c$ — and that order violates the edge $c \to b$ . Finish times ( $b$ first, then $c$ , then $a$ , reversed to $a, c, b$ ) are what the theorem guarantees.
Appending instead of prepending. $TS-Visit$ pushes each finished vertex onto the front of $L$ ; appending to the back builds the exact reverse of a topological order. The stack formulation avoids the confusion: push on finish, then pop everything.
Forgetting the outer loop. Both DFS passes must restart from every still-white vertex, not just one chosen source. A DAG can have several sources, and in Kosaraju's second pass the restarts are the whole point — each restart begins a new component.
Running toposort on a cyclic graph without checking. DFS finish times always produce an ordering, even on a cyclic input, where no valid order exists; garbage in, garbage out. Detect the cycle first: a gray-to-gray edge in DFS, or leftover vertices in Kahn's algorithm.
Reversing the wrong thing in Kosaraju. The second pass runs on $G^{T}$ in decreasing finish order of the first pass. Increasing order breaks the invariant that each root's component is a source among the survivors. (The mirror-image variant — first pass on $G^{T}$ , second on $G$ — is fine, since $(G^{T})^{T} = G$ .)
Treating SCCs like undirected components. One directed path between two vertices does not make them strongly connected; the path back must also exist. A digraph can be weakly connected (connected if you ignore directions) yet have $∣ V ∣$ singleton SCCs — any DAG is an example.

Condensations and one-pass SCC

Tarjan's one-pass algorithm. Kosaraju runs DFS twice; Tarjan's algorithm (1972) finds SCCs in a single pass.⁴ It carries a low[v] value — the smallest discovery time reachable from $v$ 's subtree via at most one back or cross edge into the current stack — exactly the low-link idea reused in the bridges and articulation points lesson. Vertices are pushed onto an auxiliary stack as they are discovered; when a vertex $v$ finishes with low[v] == disc[v], it is the root of an SCC, and everything above it on the stack is popped off as that component. One DFS, no transpose graph, and the components emerge in reverse topological order for free — which is why competitive-programming 2-SAT solvers almost always use Tarjan.

The condensation is the point. Collapsing each SCC to a super-vertex yields the condensation $G^{SCC}$ , always a DAG. Many is there a path / can everything reach everything questions on a general digraph reduce to a topological-order sweep over this DAG: reachability, computing the transitive closure component-wise, finding a single vertex that reaches all others (a source SCC in the condensation), or adding the fewest edges to make a digraph strongly connected (a classic result of Eswaran and Tarjan counts sources and sinks of the condensation). The two-phase pattern — find SCCs, then run a DAG algorithm on the condensation — is the template behind the whole next stretch of this module, most directly 2-SAT.

Dynamic and incremental variants. When edges arrive over time, recomputing SCCs from scratch is wasteful; incremental-SCC and incremental-topological-order algorithms (Bender, Fineman, Gilbert, Tarjan) maintain the ordering under edge insertions in near-linear total time, the machinery behind pointer-analysis and build-system dependency engines that must react to each new edge.

Takeaways

A DAG is a directed graph with no cycle; it has a topological order (every edge points forward) if and only if it is acyclic.
DFS detects acyclicity by the absence of back edges, and listing vertices in decreasing finish time yields a topological order in $Θ (V + E)$ .
Kahn's algorithm peels off in-degree- $0$ sources with a queue, also in $Θ (V + E)$ , and detects a cycle for free: the queue runs dry with vertices left over exactly when the graph is not a DAG.
A topological order provides a safe evaluation order for acyclically dependent quantities, predecessors first. Collapsing the Fibonacci recursion into its evaluation DAG and sweeping it in topological order turns exponential recursion into a linear pass, the seed of dynamic programming.
Strongly connected components are maximal mutually-reachable vertex sets; contracting them always produces a DAG.
Kosaraju's two-pass DFS (run DFS, transpose, run DFS in decreasing finish order) finds all SCCs in $Θ (V + E)$ ; Tarjan's low-link method does it in one pass.

Erickson, Ch. 6 — Depth-First Search — a digraph is acyclic iff DFS finds no back edge. ↩
CLRS, Ch. 22 — Elementary Graph Algorithms — topological sort by decreasing DFS finish time in $Θ (V + E)$ . ↩ ↩²
Skiena, §5 — Graph Traversal — finding strongly connected components via two DFS passes. ↩ ↩²
Tarjan, R. E. (1972), Depth-first search and linear graph algorithms, SIAM Journal on Computing 1(2), 146–160 — the single-pass low-link SCC algorithm. ↩

Directed acyclic graphs

Topological order

Topological sort via DFS finish times

A full trace on the prerequisite DAG

Why a topological order matters: the evaluation DAG

Kahn's algorithm: peeling sources

Strong connectivity

Kosaraju's two-pass algorithm

A complete run

Common pitfalls

Condensations and one-pass SCC

Takeaways

Footnotes