Minimum Spanning Trees

Suppose you must lay cable to connect a set of towns, and every possible connection has a known cost. You want all the towns linked — any town reachable from any other — while spending as little as possible. Laying a redundant link would only waste money, so the cheapest solution can contain no cycle: it is a tree that spans every town. Finding the cheapest such tree is the minimum spanning tree problem, and it is the first place in this course where a greedy strategy is provably optimal.

The problem

Stated precisely, in the shape the rest of this lesson will use:

Here a tree means a connected, acyclic graph, not to be confused with a rooted or rooted-and-ordered tree; an MST has no distinguished root. A spanning tree on $n = ∣ V ∣$ vertices always has exactly $n - 1$ edges: enough to connect everything, one fewer than would create a cycle.

The word tree here admits several equivalent characterizations, any one of which could serve as the definition.

Below, a weighted graph and one of its minimum spanning trees (the thick colored edges). This nine-town graph is the running example for the whole lesson: every trace and every snapshot below runs on it.

A weighted graph with one minimum spanning tree shown in thick edges.

The thick tree spans all nine towns with total weight $1 + 2 + 2 + 4 + 6 + 7 + 7 + 9 = 38$ ; no spanning tree is cheaper.

Safe edges and the generic method

Every algorithm in this lesson is an instance of one greedy template: maintain a set $A$ of edges that is always a subset of some MST, and at each step add one more safe edge, meaning an edge that can be added to $A$ while keeping $A \subseteq$ (some MST).¹

Algorithm 1:

\textsc{Generic-MST}(G, c)

— the template every MST algorithm instantiates

1
$A \gets \emptyset$
2
while $A$ is not a spanning tree do
3
find an edge $e$ that is safe for $A$
the only hard step
4
$A \gets A \cup \set{e}$
5
return $A$

The invariant holds trivially at initialization ( $\emptyset$ is a subset of any MST), it is maintained by the definition of safe, and at termination it does all the work: $A$ has $n - 1$ edges and is contained in some MST $T$ , but $T$ also has exactly $n - 1$ edges, so $A = T$ . The loop runs exactly $n - 1$ times, once per added edge.

Two things are not obvious. First, a safe edge always exists while $A$ is not yet spanning: the invariant gives an MST $T \supseteq A$ , and any edge of $T ∖ A$ is safe by definition. Second, and harder, a safe edge must be recognizable without already knowing an MST — otherwise the template is circular. The entire theory of MSTs reduces to two local certificates: the cut property, which certifies that an edge is safe to include, and the cycle property, which certifies that an edge is safe to exclude.

The cut property

First, the vocabulary. A cut $(S, V ∖ S)$ is a partition of the vertices into two groups. An edge crosses the cut if its endpoints lie on opposite sides. A cut respects an edge set $A$ if no edge of $A$ crosses it. An edge crossing the cut is light (or cheapest) if it has the minimum weight of all crossing edges.

A cut splitting vertices into

S

and its complement

V ∖ S

, with the light crossing edge

(u, v)

highlighted.

Proof (an exchange argument — done carefully). Let $T^{*}$ be an MST containing $A$ , and suppose, for contradiction, that no MST contains $e$ . Since $T^{*}$ is connected, there is at least one edge of $T^{*}$ crossing the cut $(S, V ∖ S)$ .

It is tempting to grab any such crossing edge $f \in T^{*}$ , swap it for $e$ , and argue $T^{*} - f + e$ is a cheaper tree. This is a mistake: removing an arbitrary crossing edge $f$ need not reconnect into a tree once we add $e$ ; the result can be disconnected or contain a cycle. We must remove the right edge.

So instead: adding $e$ to $T^{*}$ creates a unique cycle, and because $e$ crosses the cut, that cycle must cross back at some edge $g \in T^{*}$ , with $g$ also crossing $(S, V ∖ S)$ . Because the cut respects $A$ , this $g \in / A$ . Form

T^{'} = T^{*} - {g} + {e} .

Deleting $g$ breaks the unique cycle, so $T^{'}$ is again a spanning tree, and it still contains $A \cup {e}$ . Since $e$ is the light crossing edge, $c_{e} \leq c_{g}$ , hence

cost (T^{'}) = cost (T^{*}) - c_{g} + c_{e} \leq cost (T^{*}) .

But $T^{*}$ was minimum, so equality holds: $T^{'}$ is also an MST, and it contains $e$ . Contradiction. Therefore some MST contains $A \cup {e}$ , i.e. $e$ is safe. $□$

The exchange inside

T^{*}

: adding

e

creates one cycle; that cycle re-crosses the cut at some tree edge

g \in / A

, and

T^{*} - g + e

is again a spanning tree, no heavier than

T^{*}

Here is the same picture on the nine-town graph. Take $S = {a, b, c, d, e}$ , the towns above the dashed line. Five edges cross this cut: $a$ – $h$ at $8$ , $b$ – $i$ at $11$ , $c$ – $i$ at $2$ , $d$ – $f$ at $14$ , and $e$ – $f$ at $10$ . The light one is $c$ – $i$ , so the cut property guarantees that $c$ – $i$ belongs to some MST — and indeed it is in the thick tree above. Both Kruskal and Prim will commit to $c$ – $i$ early, each by building a cut like this one.

A concrete cut on the nine-town graph:

S = {a, b, c, d, e}

above the dashed line. Five edges cross; the light one,

c

–

i

at weight

2

, is safe.

Every correct MST algorithm — Borůvka's, Prim's, or Kruskal's — is a strategy for choosing which cut to apply the property to, and all three only ever add edges that obey the cut rule.

The cycle property

The cut property justifies including edges; its mirror image justifies discarding them.²

The fine print matters: the theorem speaks only about edges that lie on a cycle. An edge on no cycle is a bridge, and a bridge is in every spanning tree no matter how expensive it is. Heaviest edge of the graph is not the same as heaviest edge on a cycle:

Left: on the cycle

w

–

x

–

y

–

z

, the strict maximum (weight

8

, dashed) is in no MST. Right: the weight-

9

edge is the heaviest in its graph, yet it is a bridge, so every spanning tree must include it.

Together the two properties settle the uniqueness question.

Distinctness is sufficient but not necessary. The nine-town graph has ties (two edges of weight $2$ , three of weight $7$ ) yet a unique MST: every one of the five non-tree edges is the strict maximum on the cycle it closes ( $i$ – $h$ at $7$ closes a cycle whose other edges weigh $6$ and $1$ ; $a$ – $h$ at $8$ beats a path of maximum weight $7$ ; and so on), so by the cycle property none of them is in any MST, which forces the remaining eight edges.

Borůvka's algorithm

The oldest MST algorithm (Otakar Borůvka, 1926) is also the most directly cut-rule driven, and it parallelises well.² The idea: every component, in every round, simultaneously selects its own cheapest outgoing edge.

Maintain a forest $A$ , initially the $n$ isolated vertices. In each round, each current component $C$ looks at the cut $(C, V ∖ C)$ , which respects $A$ , and selects its lightest crossing edge. By the cut property every such edge is safe, so we add them all at once and merge the components they join. Every component merges with at least one neighbor, so the number of components at least halves each round, and only $O (log V)$ rounds are needed.

Algorithm 2:

\textsc{Borůvka}(G, c)

— every component grabs its cheapest exit edge

1
$A \gets \emptyset$
$n$ singleton components
2
while $A$ has more than one component do
3
foreach component $C$ of $(V, A)$ do
4
$e_C \gets$ the lightest edge crossing $(C,\, V \setminus C)$
safe by cut rule
5
foreach distinct edge $e_C$ chosen do
6
$A \gets A \cup \set{e_C}$
add all exit edges at once
7
return $A$

Running time. Each round scans all edges to find component-minimum exits in $O (E)$ time and there are $O (log V)$ rounds, so $Bor \overset{u}{˚} vka$ runs in $O (E log V)$ , the same headline bound as Prim and Kruskal, but with the useful property that the per-round work is fully parallel.

boruvka.pypython

from collections.abc import Hashable
from typing import Optional, TypeVar

from graph import Edge, Graph
from union_find import UnionFind

Label = TypeVar("Label", bound=Hashable)

def boruvka(graph: Graph[Label]) -> list[Edge[Label]]:
  """
    A minimum spanning tree of a connected, undirected `graph`, returned\n
    as the list of its tree edges. On a disconnected graph this returns a\n
    minimum spanning forest (each component is spanned).\n
  """
  components: UnionFind[Label] = UnionFind(graph.labels)
  tree_edges: list[Edge[Label]] = []

  # a total order on edges for tie-breaking: weight, then endpoint labels.
  indexed_edges: list[tuple[int, Edge[Label]]] = list(enumerate(graph.edges()))

  def edge_key(entry: tuple[int, Edge[Label]]) -> tuple[float, int]:
    index, edge = entry
    return (edge.weight, index)

  while components.count > 1:
    # cheapest safe exit edge leaving each component this round.
    cheapest_exit: dict[Label, tuple[int, Edge[Label]]] = {}
    for index, edge in indexed_edges:
      source_root: Label = components.find(edge.source.label)
      target_root: Label = components.find(edge.target.label)
      if source_root == target_root:
        continue

      # this edge crosses out of both endpoints' components; offer it to each.
      candidate: tuple[int, Edge[Label]] = (index, edge)
      for root in (source_root, target_root):
        current: Optional[tuple[int, Edge[Label]]] = cheapest_exit.get(root)
        if current is None or edge_key(candidate) < edge_key(current):
          cheapest_exit[root] = candidate

    # no component has an exit edge => the graph (or remainder) is split.
    if not cheapest_exit:
      break

    # add every chosen exit edge at once; union skips any that now form a cycle
    # (when two components picked the same edge from opposite sides).
    for _, edge in cheapest_exit.values():
      if components.union(edge.source.label, edge.target.label):
        tree_edges.append(edge)
  return tree_edges

graph.pypython

from collections.abc import Hashable, Iterator
from typing import Generic, Optional, TypeVar


Label = TypeVar("Label", bound=Hashable)


class Edge(Generic[Label]):
  """
    A directed connection from `source` to `target`, carrying a weight.\n
  """

  def __init__(
    self,
    source: Vertex[Label],
    target: Vertex[Label],
    weight: float = 1.0,
  ) -> None:
    self.source: Vertex[Label] = source
    self.target: Vertex[Label] = target
    self.weight: float = weight

  def __repr__(self) -> str:
    return f"Edge({self.source.label!r} -> {self.target.label!r}, w={self.weight})"


class Vertex(Generic[Label]):
  """
    A graph vertex: a label plus the list of edges leaving it.\n
  """

  def __init__(self, label: Label) -> None:
    self.label: Label = label
    self.outgoing: list[Edge[Label]] = []

  def neighbors(self) -> list[Vertex[Label]]:
    """
      The vertices reachable from this one by a single edge.\n
    """
    return [edge.target for edge in self.outgoing]

  def edge_to(self, label: Label) -> Optional[Edge[Label]]:
    """
      The outgoing edge to the vertex with `label`, or None.\n
    """
    for edge in self.outgoing:
      if edge.target.label == label:
        return edge
    return None

  def __repr__(self) -> str:
    return f"Vertex({self.label!r})"


class Graph(Generic[Label]):
  """
    A graph of Vertex objects linked by Edge objects.\n
    Pass `directed=True` for a digraph; otherwise each `add_edge` inserts\n
    the reverse edge too.\n
  """

  def __init__(self, directed: bool = False) -> None:
    self.directed: bool = directed
    self._vertices: dict[Label, Vertex[Label]] = {}

  def add_vertex(self, label: Label) -> Vertex[Label]:
    """
      Return the vertex for `label`, creating it if it is absent.\n
    """
    # reuse the existing vertex, or mint and register a fresh one.
    vertex = self._vertices.get(label)
    if vertex is None:
      vertex = Vertex(label)
      self._vertices[label] = vertex
    return vertex

  def add_edge(
    self,
    source_label: Label,
    target_label: Label,
    weight: float = 1.0,
  ) -> None:
    """
      Connect two labels (creating either vertex as needed).\n
      Adds the reverse edge as well when the graph is undirected.\n
    """
    source = self.add_vertex(source_label)
    target = self.add_vertex(target_label)

    # link source to target, and mirror it back when undirected.
    source.outgoing.append(Edge(source, target, weight))
    if not self.directed:
      target.outgoing.append(Edge(target, source, weight))

  def vertex(self, label: Label) -> Vertex[Label]:
    """
      The vertex carrying `label` (raises KeyError if absent).\n
    """
    return self._vertices[label]

  @property
  def vertices(self) -> list[Vertex[Label]]:
    """
      Every vertex, in insertion order.\n
    """
    return list(self._vertices.values())

  @property
  def labels(self) -> list[Label]:
    """
      Every vertex label, in insertion order.\n
    """
    return list(self._vertices)

  def edges(self) -> Iterator[Edge[Label]]:
    """
      Each edge once — an undirected edge is yielded a single time.\n
    """
    # track undirected endpoint pairs so each is emitted only once.
    seen: set[frozenset[Label]] = set()

    for vertex in self._vertices.values():
      for edge in vertex.outgoing:
        # skip an undirected edge already yielded from the other endpoint.
        if not self.directed:
          endpoints = frozenset((edge.source.label, edge.target.label))
          if endpoints in seen:
            continue
          seen.add(endpoints)

        yield edge

  def __contains__(self, label: Label) -> bool:
    return label in self._vertices

  def __iter__(self) -> Iterator[Vertex[Label]]:
    return iter(self._vertices.values())

  def __len__(self) -> int:
    return len(self._vertices)

union_find.pypython

from collections.abc import Hashable, Iterable
from typing import Generic, TypeVar, cast


Element = TypeVar("Element", bound=Hashable)


class DisjointSetNode(Generic[Element]):
  """
    One element's node: its value, its parent link, and its rank.\n
    A node is its own parent exactly when it is the root of its set.\n
  """

  def __init__(self, value: Element) -> None:
    self.value: Element = value
    self.parent: DisjointSetNode[Element] = self
    self.rank: int = 0

  def __repr__(self) -> str:
    return f"DisjointSetNode({self.value!r})"


class UnionFind(Generic[Element]):
  """
    A collection of disjoint sets over hashable elements.\n
  """

  def __init__(self, elements: int | Iterable[Element] = 0) -> None:
    """
      Seed the structure. An int `n` creates singletons `0..n-1`;\n
      an iterable creates one singleton node per member.\n
    """
    # a seed count `n` means the elements are 0..n-1 (ints standing in for
    # Element); cast keeps the type checker happy about that substitution.
    members: Iterable[Element] = (
      cast("Iterable[Element]", range(elements))
      if isinstance(elements, int)
      else elements
    )
    # one singleton node per seeded member.
    self._nodes: dict[Element, DisjointSetNode[Element]] = {
      value: DisjointSetNode(value) for value in members
    }
    self.count: int = len(self._nodes)

  def add(self, value: Element) -> None:
    """
      Add `value` as a new singleton set if it is absent.\n
    """
    if value not in self._nodes:
      self._nodes[value] = DisjointSetNode(value)
      self.count += 1

  def _find_root(self, value: Element) -> DisjointSetNode[Element]:
    """
      The root node of `value`'s set, compressing the path on the way.\n
    """
    # first pass: climb parent links to the root of the set.
    node = self._nodes[value]
    root = node
    while root.parent is not root:
      root = root.parent

    # second pass: point every node on the path straight at the root.
    while node.parent is not root:
      node.parent, node = root, node.parent

    return root

  def find(self, value: Element) -> Element:
    """
      The representative value of `value`'s set.\n
    """
    return self._find_root(value).value

  def union(self, first: Element, second: Element) -> bool:
    """
      Merge the sets containing `first` and `second`.\n
      Returns False if they already shared a set.\n
    """
    # already in the same set: nothing to merge.
    first_root = self._find_root(first)
    second_root = self._find_root(second)
    if first_root is second_root:
      return False

    # hang the shorter tree under the taller one.
    if first_root.rank < second_root.rank:
      first_root, second_root = second_root, first_root
    second_root.parent = first_root

    # equal ranks: the merged tree grows one level taller.
    if first_root.rank == second_root.rank:
      first_root.rank += 1

    self.count -= 1
    return True

  def connected(self, first: Element, second: Element) -> bool:
    """
      Whether `first` and `second` belong to the same set.\n
    """
    return self._find_root(first) is self._find_root(second)

One round on six isolated vertices shows the parallel grab. Each singleton (a component of one) points an arrow along its own cheapest incident edge. Vertices $a$ and $b$ pick each other (both cheapest at $3$ ), as do $d$ and $e$ (at $2$ ) and $c$ and $f$ (at $4$ ), so the six arrows name only three distinct edges, and one sweep merges six components into three:

One Borůvka round: every component (here singletons) selects its cheapest exit edge (arrows). The three distinct chosen edges merge six components into three.

How fast can an MST be found?

The three classical algorithms all land at $O (m log n)$ . Is the log necessary? The answer, developed over decades, is essentially no — and Borůvka's rounds appear in every improvement.

Borůvka as an accelerator. A single Borůvka phase costs $O (m)$ and at least halves the vertex count, contracting each component to a single super-vertex. Running $log log n$ phases before switching to Prim yields $O (m log log n)$ ; interleaving Borůvka contraction with a Fibonacci-heap priority queue gives Fredman and Tarjan's $O (m log^{*} n)$ (1987), where $log^{*}$ — the iterated logarithm — is at most $5$ for any input that fits in the universe.³ The pattern is always the same: use Borůvka to shrink the graph cheaply, then spend the expensive per-edge work on a much smaller instance.

The randomized linear-time algorithm. Karger, Klein, and Tarjan (1995) gave an MST algorithm running in $O (m)$ expected time.⁴ It alternates Borůvka contraction with a sampling step: pick each edge independently with probability $\frac{1}{2}$ , recursively find the MST of the sample, then use that sample-forest to discard every edge that is F-heavy (heavier than the heaviest edge on the sample-tree path between its endpoints — a cycle-property rejection in bulk). A linear-time MST verification procedure certifies the discards, and a sampling lemma bounds the surviving edges by $O (n)$ , collapsing the recursion to linear expected work. It is the first MST algorithm to escape the sorting bottleneck entirely.

The deterministic frontier. Whether a deterministic linear-time MST algorithm exists is still open. Chazelle (2000) came closest with $O (m α (n))$ using a data structure called the soft heap, which deliberately corrupts a few keys to run faster; and Pettie and Ramachandran (2002) gave a provably optimal deterministic algorithm whose exact running time equals the (unknown) decision-tree complexity of the problem — optimal without anyone knowing what that optimum is.⁵

The acceleration pattern behind fast MST algorithms: a Borůvka phase contracts each component to a super-vertex in

O (m)

time, at least halving

n

, so the expensive work runs on a smaller graph. Repeated contraction drives the log factor down toward a constant.

The practical takeaway matches the theory: real fast-MST codes run a couple of Borůvka rounds to shrink the graph, then finish with Prim or Kruskal. This continues in Kruskal and Prim, the two algorithms you will actually implement — one growing a forest with union-find, the other a single tree with a priority queue.

CLRS, Ch. 23 — Minimum Spanning Trees — the generic method, safe edges, and the cut property identifying a safe edge for the greedy MST template. ↩
Erickson, Ch. 8 — Minimum Spanning Trees — the cut and cycle properties, and Borůvka's component-merging rounds in $O (log V)$ phases. ↩ ↩²
Fredman, M. L. & Tarjan, R. E. (1987), Fibonacci heaps and their uses in improved network optimization algorithms, Journal of the ACM 34(3), 596–615 — the $O (m log^{*} n)$ MST bound. ↩
Karger, D. R., Klein, P. N. & Tarjan, R. E. (1995), A randomized linear-time algorithm to find minimum spanning trees, Journal of the ACM 42(2), 321–328 — expected linear-time MST via sampling and Borůvka contraction. ↩
Pettie, S. & Ramachandran, V. (2002), An optimal minimum spanning tree algorithm, Journal of the ACM 49(1), 16–34 — a provably optimal deterministic MST algorithm; and Chazelle, B. (2000), A minimum spanning tree algorithm with inverse-Ackermann type complexity, JACM 47(6), 1028–1047. ↩