Minimum Spanning Trees
Given a weighted network, how do we connect everything as cheaply as possible? The answer is a minimum spanning tree.
╌╌╌╌
Suppose you must lay cable to connect a set of towns, and every possible connection has a known cost. You want all the towns linked — any town reachable from any other — while spending as little as possible. Laying a redundant link would only waste money, so the cheapest solution can contain no cycle: it is a tree that spans every town. Finding the cheapest such tree is the minimum spanning tree problem, and it is the first place in this course where a greedy strategy is provably optimal.
The problem
Stated precisely, in the shape the rest of this lesson will use:
Here a tree means a connected, acyclic graph, not to be confused with a rooted or rooted-and-ordered tree; an MST has no distinguished root. A spanning tree on vertices always has exactly edges: enough to connect everything, one fewer than would create a cycle.
It is worth recalling why tree
is exactly the right object. For a graph
the following are equivalent, and any one of them could serve as the
definition.
Below, a weighted graph (left) and one of its minimum spanning trees (the thick colored edges):
The thick tree spans all nine towns with total weight ; no spanning tree is cheaper.
The cut property
All three algorithms below are instances of a single greedy template: maintain a set of edges that is always a subset of some MST, and at each step add one more safe edge, meaning an edge that can be added to while keeping (some MST). The entire theory of MSTs reduces to recognizing safe edges, and one lemma does that for us.1
First, the vocabulary. A cut is a partition of the vertices into two groups. An edge crosses the cut if its endpoints lie on opposite sides. A cut respects an edge set if no edge of crosses it. An edge crossing the cut is light (or cheapest) if it has the minimum weight of all crossing edges.
The cut property is the engine. Every correct MST algorithm, whether Borůvka's, Prim's, or Kruskal's, is just a strategy for choosing which cut to apply it to, and all three only ever add edges that obey the cut rule.
Borůvka's algorithm
The oldest MST algorithm (Otakar Borůvka, 1926) is also the most directly cut-rule driven, and it parallelises well.2 The idea: every component, in every round, simultaneously reaches out along its own cheapest outgoing edge.
Maintain a forest , initially the isolated vertices. In each round, each current component looks at the cut , which respects , and selects its lightest crossing edge. By the cut property every such edge is safe, so we add them all at once and merge the components they join. Because every component at least halves in count each round (each one merges with at least one neighbor), the number of components drops by a factor per round, so only rounds are needed.
- 1singleton components
- 2while has more than one component do
- 3foreach component of do
- 4the lightest edge crossingsafe by cut rule
- 5foreach distinct edge chosen do
- 6add all exit edges at once
- 7return
Running time. Each round scans all edges to find component-minimum exits in time and there are rounds, so runs in , the same headline bound as Prim and Kruskal, but with the useful property that the per-round work is fully parallel.
One round on six isolated vertices shows the parallel grab. Each singleton (a component of one) points an arrow along its own cheapest incident edge; the chosen edges, taken together, merge the six components down to two in a single sweep:
Kruskal's algorithm
's strategy is global and edge-centric: consider the edges in order of increasing weight, and add each one unless it would form a cycle. The set is a forest (an acyclic graph) that gradually coalesces into a single tree once is fully connected.
Why is each added edge safe? When Kruskal accepts the cheapest remaining edge , it connects two different trees of the current forest. Take to be the vertices of 's tree. This cut respects (no forest edge crosses out of a tree), and is the lightest edge crossing it; any lighter crossing edge would have been considered earlier and would have joined the trees already. By the cut property, is safe.
The cycle test, are and already in the same tree?
, is exactly the
question : does adding
keep the graph acyclic? This is what the
disjoint-set (union-find) data
structure answers efficiently. It supports ,
(, which component is in?), and (merge two
components).
- 1
- 2foreach vertex do
- 3call
- 4sort the edges of into nondecreasing order by weight
- 5foreach edge in sorted order do
- 6if thenstays acyclic
- 7safe edge
- 8call
- 9return
Running on the nine-town graph above makes the accept/reject rhythm visible. We scan the thirteen edges in nondecreasing weight; the first eight that join distinct trees are accepted, and the five that would close a cycle are skipped. The eight accepted edges are exactly the MST of weight :
Making it fast: union by size
The simplest implementation keeps an array where names 's current component. Then is , but a naive that relabels one whole side costs in the worst case. The fix is the classic union-by-size argument:
- Alongside , keep , a linked list of the vertices currently in component , so we can enumerate a component cheaply.
- On , relabel the smaller component: choose the side with and rewrite for the vertices in that smaller side.
Why this wins: whenever a vertex has its label rewritten, it sat in the smaller of the two merged components, so the component containing at least doubles in size. A vertex can be doubled at most times before it is in a component of all vertices. Hence each vertex is relabelled times across the whole run, and the total work spent updating over all unions is .
Running time. Sorting the edges dominates: since . The union-find work adds only with union by size (and is effectively linear, , with the inverse-Ackermann tricks of union-by-rank plus path compression). So overall runs in
Prim's algorithm
's strategy is local and vertex-centric: grow a single tree outward from an arbitrary root, repeatedly attaching the cheapest edge that links a tree vertex to a non-tree vertex. Here the set is always one connected tree.3
Safety is again the cut property. Let be the vertices currently in the tree. The cut respects , and Prim deliberately picks the lightest edge crossing it, a light edge, so the chosen edge is safe.
This is exactly with the relaxation rule changed. Where Dijkstra keys a frontier vertex by (distance from the source), Prim keys it by alone (cost to attach to the finished set). Prim keeps every non-tree vertex in a min-priority queue keyed by , maintaining the invariant
Extracting the minimum yields the next vertex to absorb; absorbing it may make its neighbors cheaper to reach, so we relax their keys with .
- 1foreach vertex do
- 2
- 3
- 4start tree at
- 5min-PQ keyed by
- 6
- 7while do
- 8cheapest to attach
- 9if then
- 10commit safe edge
- 11foreach adjacent to with do
- 12if then
- 13's best link to tree
- 14Decrease-Key
- 15return
A single Prim step looks like this. The tree has been grown from root ; every frontier vertex carries a key equal to its cheapest edge into . The cut respects the tree, and returns the endpoint of the lightest crossing edge — here at weight — which the cut property certifies as safe:
Running time. Prim performs operations and up to operations, so in general . With a binary heap all three operations cost , giving , the same as Kruskal. With a Fibonacci heap, drops to amortized while stays , improving the total to
Notice that when the graph is dense, , this is , that is, linear, and asymptotically the best known.
Which to use?
Throughout, and .
| Borůvka | Kruskal | Prim | |
|---|---|---|---|
| Grows | every component at once | a forest, globally | a single tree, from a root |
| Core structure | component scan / union-find | union-find (union by size) | priority queue |
| Headline time | |||
| Best variant | parallel rounds | work after sort | (Fib. heap) |
| Shines when | parallel / distributed | edges already sorted; sparse | dense; adjacency-rich data |
All three are correct for the same reason, the cut property, and all three are greedy algorithms whose greediness is provably optimal, a rarity worth savoring. If edge weights are not distinct the MST may not be unique, but every choice these algorithms make is still safe, so they always return a minimum spanning tree. (The fastest known practical approaches combine Borůvka's rounds with Prim/Kruskal to shave the bound further.)
Takeaways
- A minimum spanning tree connects all vertices of a connected weighted
graph with exactly edges of least total weight;
tree
means connected and acyclic, no root. - The cut property (cut rule) is the one lemma behind every MST algorithm: a light edge crossing a cut that respects the current edge set is always safe. The exchange argument must swap out the edge on the cycle created by adding ; swapping an arbitrary crossing edge is a tempting mistake.
- adds every component's cheapest exit edge each round; halving the component count gives rounds and total, with naturally parallel rounds.
- adds edges cheapest-first, skipping cycle-forming ones, using union-find; sorting dominates at , and union by size keeps the relabelling work to because each vertex's component can double only times.
- is rekeyed by attachment cost: grow one tree from a start vertex, attaching the lightest crossing edge via a priority queue; with a binary heap, with a Fibonacci heap, linear on dense graphs.
- Greedy is optimal here, a clean, provable success of the greedy paradigm.
Footnotes
╌╌ END ╌╌