Shortest Paths · study.

Every navigation app, every network router, every game pathfinder is solving the same problem: given a weighted graph, find the cheapest route from one place to another. BFS already solved this when every edge counts as one step. Now the edges carry weights (distances, times, costs), and we want to minimize the total weight along a path. This lesson builds the shortest-path toolkit from a single primitive shared by every algorithm in it.

The problem and its primitive

Every algorithm maintains two arrays. For each vertex $v$ , an estimate $v . d$ is an upper bound on $δ (s, v)$ , always $\geq$ the true distance, shrinking toward it. A predecessor $v . π$ records the previous vertex on the best path found so far, forming a shortest-path tree. We initialize $s . d = 0$ and $v . d = \infty$ for every other vertex.

The one operation that updates these estimates is relaxation: testing whether going through $u$ improves our route to $v$ .

Algorithm 1:

\textsc{Relax}(u, v, w)

— try the edge

(u,v)

as a shortcut to

v

1
if $u.d + w(u, v) < v.d$ then
2
$v.d \gets u.d + w(u, v)$
cheaper route to v via u
3
$v.\pi \gets u$

Relaxation never produces an estimate below the true distance, and it can only ever lower an estimate. Every shortest-path algorithm below is a different discipline for deciding which edges to relax, and in what order. Two facts make relaxation work: the triangle inequality $δ (s, v) \leq δ (s, u) + w (u, v)$ , and optimal substructure: any subpath of a shortest path is itself a shortest path.¹ The latter is what makes greedy and dynamic-programming approaches both viable.

One relaxation step looks like this. Before, $v$ 's best-known route costs $9$ ; we test the edge $(u, v)$ of weight $3$ against $u$ 's settled estimate $u . d = 5$ . Since $5 + 3 = 8 < 9$ , the edge is a shortcut: $v . d$ drops to $8$ and $v . π$ is rewired to point back through $u$ .

One

Relax (u, v, w)

step. The edge

u \to v

beats

v

's current estimate (

5 + 3 < 9

), so

v . d

falls to

8

and its predecessor is rewired to

u

We will trace the algorithms on this small weighted digraph. The negative edge $a \to b$ of weight $- 2$ is harmless here (there is no negative cycle), but edges like it are what break Dijkstra and force a dynamic program.

A small weighted digraph with one negative edge

a

b

of weight

- 2

, used to trace the algorithms.

Dijkstra's algorithm

When all edge weights are non-negative, we can be greedy. $Dijkstra$ 's algorithm grows a set $S$ of vertices whose shortest distances are finalized. At each step it picks the non-finalized vertex $u$ with the smallest estimate $u . d$ , finalizes it, and relaxes its outgoing edges. A min-priority queue keyed by $d$ supplies the next vertex.

Algorithm 2:

\textsc{Dijkstra}(G, w, s)

— SSSP for non-negative weights

1
foreach vertex $v \in V$ do
2
$v.d \gets \infty$
3
$v.\pi \gets \text{nil}$
4
$s.d \gets 0$
5
$S \gets \emptyset$
6
$Q \gets V$
min-PQ keyed by d
7
while $Q \neq \emptyset$ do
8
$u \gets$ $\textsc{Extract-Min}(Q)$
closest unfinalized
9
$S \gets S \cup \set{u}$
u.d now final
10
foreach $v$ adjacent to $u$ do
11
call $\textsc{Relax}(u, v, w)$
Decrease-Key updates Q
12
return $d$ and $\pi$

Correctness rests on a cut argument — the same $S$ versus $V ∖ S$ split that powered the exchange proofs for minimum spanning trees. The correctness claim is a loop invariant, and unwinding it across the whole run gives the theorem that justifies the greedy commitment.

Proof. The first extraction is $s$ itself, with $s . d = 0 = δ (s, s)$ . Suppose for contradiction that extracting $u$ were the first extraction to violate the theorem. Then either $u . d$ is too low or too high.

Too low ( $u . d < δ (s, u)$ ) is impossible, because relaxation never drops an estimate below the true distance — every value of $u . d$ is the cost of some real $s ⇝ u$ walk.
Too high ( $u . d > δ (s, u)$ ): let $p$ be a genuine shortest path from $s$ to $u$ . Walk $p$ from $s$ (inside $S$ ) toward $u$ (outside it), and let $(x, y)$ be the first edge that crosses the cut — so $x \in S$ , $y \in / S$ . By the invariant $x . d = δ (s, x)$ , and $(x, y)$ was relaxed when $x$ was finalized, so $y . d = δ (s, x) + w (x, y) = δ (s, y)$ . Now $y$ sits on a shortest path to $u$ , and because every weight is $\geq 0$ , the rest of $p$ only adds cost: $δ (s, y) \leq δ (s, u) \leq u . d$ . Hence $y . d \leq u . d$ , so $Extract-Min$ would have returned $y$ , not $u$ — contradiction. $□$

The non-negativity is doing all the work in that last inequality: it guarantees extending a path never decreases its cost, so the closest frontier vertex can be safely frozen. A single negative edge breaks $δ (s, y) \leq δ (s, u)$ , so the greedy commitment becomes unsound; the failure is exhibited concretely below.

A complete run

Here is a full run: every $Extract-Min$ , every successful relaxation, and the queue contents after each step. The graph has vertices $s, a, b, c, t$ and edges $s \to a$ ( $4$ ), $s \to b$ ( $1$ ), $b \to a$ ( $2$ ), $b \to c$ ( $5$ ), $a \to c$ ( $1$ ), $a \to t$ ( $6$ ), $c \to t$ ( $3$ ). Read a table entry $4_{s}$ as $v . d = 4$ with $v . π = s$ ; bold marks a finalized estimate, which never moves again.

Step	Extracted (key)	$s$	$a$	$b$	$c$	$t$	Queue after (vertex: key)
init	—	$0$	$\infty$	$\infty$	$\infty$	$\infty$	$s : 0, a : \infty, b : \infty, c : \infty, t : \infty$
1	$s$ $(0)$	$0$	$4_{s}$	$1_{s}$	$\infty$	$\infty$	$b : 1, a : 4, c : \infty, t : \infty$
2	$b$ $(1)$	$0$	$3_{b}$	$1_{s}$	$6_{b}$	$\infty$	$a : 3, c : 6, t : \infty$
3	$a$ $(3)$	$0$	$3_{b}$	$1_{s}$	$4_{a}$	$9_{a}$	$c : 4, t : 9$
4	$c$ $(4)$	$0$	$3_{b}$	$1_{s}$	$4_{a}$	$7_{c}$	$t : 7$
5	$t$ $(7)$	$0$	$3_{b}$	$1_{s}$	$4_{a}$	$7_{c}$	—

Two steps in the trace show the mechanism. In step 2, extracting $b$ triggers a $Decrease-Key$ on $a$ : the tentative $4_{s}$ (the direct edge) is beaten by $1 + 2 = 3$ through $b$ , so $a$ 's queue key drops and its predecessor is rewired. In step 3, the same thing happens to $c$ : $6_{b}$ falls to $4_{a}$ . Both improvements arrive before the affected vertex is extracted; the theorem guarantees this ordering can never fail with non-negative weights. The final predecessor array spells out the shortest-path tree: $t \leftarrow c \leftarrow a \leftarrow b \leftarrow s$ .

The run as a state sequence, one panel per

Extract-Min

. Shaded vertices are finalized; the label beside each vertex is its current key (black once finalized, blue while still in the queue), and the blue edges are the ones relaxed in that step.

Vertices finalize in nondecreasing order of distance ( $s, b, a, c, t$ ) — a direct consequence of the greedy invariant. Notice that $a$ is finalized at distance $3$ via the two-hop route $s \to b \to a$ , beating the direct edge $s \to a$ of weight $4$ — the relaxation through $b$ fired before $a$ was ever extracted:

Dijkstra finalizes vertices in nondecreasing distance:

s (0), b (1), a (3), c (4), t (7)

. Vertex

a

settles at

3

via

s \to b \to a

, beating the direct edge of weight

4

Running time. Like Prim, Dijkstra does exactly $∣ V ∣$ $Extract-Min$ operations (each vertex leaves the queue once) and at most $∣ E ∣$ $Decrease-Key$ operations (each edge is relaxed once, when its tail is extracted, and each successful relaxation is one key decrease). The total is therefore

V \cdot T_{Extract-Min} + E \cdot T_{Decrease-Key} .

With a binary heap both operations cost $O (log V)$ , giving $O ((V + E) log V)$ , which is $O (E log V)$ whenever every vertex is reachable, since then $E \geq V - 1$ . A Fibonacci heap makes $Decrease-Key$ amortized $O (1)$ , improving the bound to $O (E + V log V)$ . The gap matters most on dense graphs: with $E = Θ (V^{2})$ , the binary heap pays $Θ (V^{2} log V)$ while the Fibonacci heap pays $Θ (V^{2})$ .

Why negative edges break it

The theorem leaned on non-negativity exactly once, in the step the rest of $p$ only adds cost, and one negative edge is enough to break it. Take three vertices: $s \to a$ with weight $1$ , $s \to b$ with weight $2$ , and $b \to a$ with weight $- 2$ . The true distance to $a$ is $δ (s, a) = 2 + (- 2) = 0$ via $b$ . But Dijkstra extracts $s$ , then extracts $a$ (key $1$ , the current minimum) and freezes $a . d = 1$ . Only afterward does it extract $b$ and try the edge $(b, a)$ : the relaxation $2 + (- 2) = 0 < 1$ would succeed, but $a$ has already left the queue, and the algorithm never revisits a finalized vertex. The greedy schedule processed $a$ before the cheap route to it existed.

A negative edge poisons the greedy choice. Here

δ (s, a) = 0

via

s \to b \to a

, but Dijkstra extracts

a

second with

a . d = 1

and freezes it; the improving relaxation of

(b, a)

fires only after

b

is extracted — too late for a finalized vertex.

A tempting repair, adding a constant to every edge weight until none is negative, fails because it penalizes paths in proportion to their hop count: a three-edge path gains $3 c$ while a one-edge path gains only $c$ , so the reweighted graph can have a different shortest path. Handling negative edges requires giving up the greedy schedule in favor of dynamic programming.

dijkstra.pypython

import heapq
from collections.abc import Hashable
from typing import Generic, NamedTuple, Optional, TypeVar

from graph import Graph

Label = TypeVar("Label", bound=Hashable)

class ShortestPaths(NamedTuple, Generic[Label]):
  """
    The result of a single-source search: the distance to every vertex and\n
    the predecessor on the shortest-path tree (None for the source and for\n
    unreachable vertices).\n
  """
  distance: dict[Label, float]
  predecessor: dict[Label, Optional[Label]]

  def path_to(self, target: Label) -> Optional[list[Label]]:
    """
      The shortest path from the source to `target` as a vertex list,\n
      or None if `target` is unreachable.\n
    """
    if self.distance.get(target, float("inf")) == float("inf"):
      return None

    # walk predecessors back from the target to the source, then flip.
    path: list[Label] = []
    cursor: Optional[Label] = target
    while cursor is not None:
      path.append(cursor)
      cursor = self.predecessor[cursor]

    path.reverse()
    return path

def dijkstra(graph: Graph[Label], source: Label) -> ShortestPaths[Label]:
  """
    Shortest-path distances from `source` to every vertex of `graph`.\n
    Requires all edge weights to be non-negative; a negative edge can\n
    silently violate the greedy finalization and corrupt the result.\n
  """
  # every estimate starts at infinity except the source at zero.
  distance: dict[Label, float] = {label: float("inf") for label in graph.labels}
  predecessor: dict[Label, Optional[Label]] = {label: None for label in graph.labels}
  distance[source] = 0.0

  # heap entries are (estimate, label); a finalized set skips stale ones.
  frontier: list[tuple[float, Label]] = [(0.0, source)]
  finalized: set[Label] = set()
  while frontier:
    current_distance, current_label = heapq.heappop(frontier)

    # a label is popped once with its true distance; later pops are stale.
    if current_label in finalized:
      continue
    finalized.add(current_label)

    for edge in graph.vertex(current_label).outgoing:
      neighbor_label: Label = edge.target.label
      candidate: float = current_distance + edge.weight

      # relaxation: a cheaper route to the neighbor via the current vertex.
      if candidate < distance[neighbor_label]:
        distance[neighbor_label] = candidate
        predecessor[neighbor_label] = current_label
        heapq.heappush(frontier, (candidate, neighbor_label))

  return ShortestPaths(distance, predecessor)

graph.pypython

from collections.abc import Hashable, Iterator
from typing import Generic, Optional, TypeVar


Label = TypeVar("Label", bound=Hashable)


class Edge(Generic[Label]):
  """
    A directed connection from `source` to `target`, carrying a weight.\n
  """

  def __init__(
    self,
    source: Vertex[Label],
    target: Vertex[Label],
    weight: float = 1.0,
  ) -> None:
    self.source: Vertex[Label] = source
    self.target: Vertex[Label] = target
    self.weight: float = weight

  def __repr__(self) -> str:
    return f"Edge({self.source.label!r} -> {self.target.label!r}, w={self.weight})"


class Vertex(Generic[Label]):
  """
    A graph vertex: a label plus the list of edges leaving it.\n
  """

  def __init__(self, label: Label) -> None:
    self.label: Label = label
    self.outgoing: list[Edge[Label]] = []

  def neighbors(self) -> list[Vertex[Label]]:
    """
      The vertices reachable from this one by a single edge.\n
    """
    return [edge.target for edge in self.outgoing]

  def edge_to(self, label: Label) -> Optional[Edge[Label]]:
    """
      The outgoing edge to the vertex with `label`, or None.\n
    """
    for edge in self.outgoing:
      if edge.target.label == label:
        return edge
    return None

  def __repr__(self) -> str:
    return f"Vertex({self.label!r})"


class Graph(Generic[Label]):
  """
    A graph of Vertex objects linked by Edge objects.\n
    Pass `directed=True` for a digraph; otherwise each `add_edge` inserts\n
    the reverse edge too.\n
  """

  def __init__(self, directed: bool = False) -> None:
    self.directed: bool = directed
    self._vertices: dict[Label, Vertex[Label]] = {}

  def add_vertex(self, label: Label) -> Vertex[Label]:
    """
      Return the vertex for `label`, creating it if it is absent.\n
    """
    # reuse the existing vertex, or mint and register a fresh one.
    vertex = self._vertices.get(label)
    if vertex is None:
      vertex = Vertex(label)
      self._vertices[label] = vertex
    return vertex

  def add_edge(
    self,
    source_label: Label,
    target_label: Label,
    weight: float = 1.0,
  ) -> None:
    """
      Connect two labels (creating either vertex as needed).\n
      Adds the reverse edge as well when the graph is undirected.\n
    """
    source = self.add_vertex(source_label)
    target = self.add_vertex(target_label)

    # link source to target, and mirror it back when undirected.
    source.outgoing.append(Edge(source, target, weight))
    if not self.directed:
      target.outgoing.append(Edge(target, source, weight))

  def vertex(self, label: Label) -> Vertex[Label]:
    """
      The vertex carrying `label` (raises KeyError if absent).\n
    """
    return self._vertices[label]

  @property
  def vertices(self) -> list[Vertex[Label]]:
    """
      Every vertex, in insertion order.\n
    """
    return list(self._vertices.values())

  @property
  def labels(self) -> list[Label]:
    """
      Every vertex label, in insertion order.\n
    """
    return list(self._vertices)

  def edges(self) -> Iterator[Edge[Label]]:
    """
      Each edge once — an undirected edge is yielded a single time.\n
    """
    # track undirected endpoint pairs so each is emitted only once.
    seen: set[frozenset[Label]] = set()

    for vertex in self._vertices.values():
      for edge in vertex.outgoing:
        # skip an undirected edge already yielded from the other endpoint.
        if not self.directed:
          endpoints = frozenset((edge.source.label, edge.target.label))
          if endpoints in seen:
            continue
          seen.add(endpoints)

        yield edge

  def __contains__(self, label: Label) -> bool:
    return label in self._vertices

  def __iter__(self) -> Iterator[Vertex[Label]]:
    return iter(self._vertices.values())

  def __len__(self) -> int:
    return len(self._vertices)

How a map app really routes

Dijkstra explores in every direction at once, which is wasteful on a continent-sized road network. Production route planners keep the relaxation primitive but prune the search hard.

A* search. Give the algorithm a heuristic $h (v)$ , a lower bound on the remaining distance from $v$ to the target $t$ (straight-line distance on a map). A* extracts the vertex minimizing $v . d + h (v)$ instead of $v . d$ , biasing the frontier toward the goal.² When $h$ is admissible (never overestimates) and consistent ( $h (u) \leq w (u, v) + h (v)$ ), A* returns an exact shortest path while touching far fewer vertices than Dijkstra — and with $h \equiv 0$ it is Dijkstra, so the two sit on one spectrum. A* is really Dijkstra on the reweighted graph $w^{'} (u, v) = w (u, v) - h (u) + h (v)$ , and it is consistency that keeps those weights non-negative.

A* versus Dijkstra on a grid. Dijkstra's frontier (light) expands as a disk around

s

; A*'s (blue) is pulled toward

t

by the heuristic, settling far fewer cells.

Bidirectional search runs two Dijkstras at once, one forward from $s$ and one backward from $t$ , and stops when their frontiers meet; each explores roughly a hemisphere instead of a full ball, halving the exponent of the searched area.

Contraction hierarchies go further for the road-network case where the graph is fixed and queried millions of times.³ A one-time preprocessing pass ranks vertices by importance and adds shortcut edges that bypass unimportant ones, so a query only ever climbs the hierarchy from $s$ and $t$ toward their meeting point. After preprocessing, continent-scale point-to-point queries finish in microseconds — the machinery behind the instant routes in a navigation app.

On the theory side, Duan, Mao, Mao, Shu, and Yin (2025) gave the first SSSP algorithm to break Dijkstra's sorting bottleneck on directed graphs with real non-negative weights, running in $O (m log^{2/3} n)$ — evidence that even this settled-seeming problem still has room below $O (m + n log n)$ .⁴

This continues in All-Pairs and Negative Weights, where we give up the greedy schedule to handle negative edges (Bellman-Ford as a dynamic program) and compute the distance between every pair of vertices (Floyd-Warshall).

CLRS, Ch. 24 & 25 — Single-Source and All-Pairs Shortest Paths — relaxation, the triangle inequality, and optimal substructure. ↩
Hart, Nilsson & Raphael (1968), A Formal Basis for the Heuristic Determination of Minimum Cost Paths, IEEE Trans. Systems Science and Cybernetics 4(2), 100–107 — the A* algorithm and its admissibility conditions. ↩
Geisberger, Sanders, Schultes & Delling (2008), Contraction Hierarchies: Faster and Simpler Hierarchical Routing in Road Networks, Proc. WEA 2008 — shortcut-based preprocessing for fast road-network queries. ↩
Duan, Mao, Mao, Shu & Yin (2025), Breaking the Sorting Barrier for Directed Single-Source Shortest Paths, Proc. STOC 2025 — SSSP in $O (m log^{2/3} n)$ , below Dijkstra's sorting bound. ↩

The problem and its primitive

Dijkstra's algorithm

A complete run

Why negative edges break it

How a map app really routes

Footnotes