The Greedy Method

A greedy algorithm builds up a solution piece by piece, and at every step it takes the option that looks best right now (the largest, the smallest, the cheapest, the soonest), ignoring the choices still to come and never revisiting a choice already made. The bet is that locally optimal choices add up to a globally optimal solution.

Sometimes the bet pays off, and the result is a simple and fast algorithm. Often it does not, and the algorithm returns wrong answers. The central task of the greedy method is telling the two cases apart, and the only reliable way is a proof. Erickson puts the warning bluntly: most greedy algorithms are wrong, and a greedy strategy that has not been proved correct should be treated as a plausible guess, nothing more.¹

What makes greedy work

Dynamic programming, which we meet in a later module, considers all the ways a problem decomposes and picks the best. Greedy algorithms commit to one choice immediately and recurse on what remains. Two structural properties make that commitment valid.

The greedy-choice property.² There exists an optimal solution that contains the greedy (locally optimal) first choice. We never have to look ahead: a best-looking-now choice is safe, since some optimal solution agrees with it.
Optimal substructure. After making the greedy choice, what remains is a smaller instance of the same problem, and an optimal solution to the whole is the greedy choice plus an optimal solution to that subproblem.

Optimal substructure is shared with dynamic programming. The greedy-choice property is the extra ingredient: it collapses the many subproblems DP would explore down to a single one. That is why greedy algorithms, when they work, are so much faster than their DP cousins. Proving these two properties, not running the code on a few examples, is what separates a correct greedy algorithm from a hopeful heuristic.

The canonical example: activity selection

We are given $n$ activities that compete for one resource: a lecture hall, a tennis court, a single CPU. Activity $i$ has a start time $s_{i}$ and a finish time $f_{i}$ , and occupies the half-open interval $[s_{i}, f_{i})$ . Two activities are compatible if their intervals do not overlap. We want to select a largest possible set of mutually compatible activities.

Drawn on a number line, an instance and one optimal schedule look like this. The shaded bars are the chosen activities; they tile the timeline without overlap.

Timeline of activities as interval bars with a maximum compatible set shaded.

The chosen set ${a, c, e, g}$ uses four activities; no compatible set is larger. The question is which greedy rule finds such a set.

Choosing the right greedy rule

Several plausible rules suggest themselves, and most are wrong:

Earliest start first? A single activity that starts at time $0$ but runs forever blocks everything — wrong.
Shortest duration first? A short activity wedged between two longer ones can knock out two compatible activities to gain one — wrong.
Fewest conflicts first? Tempting, but constructible counterexamples defeat it too.

Two of these failures are easy to picture. Earliest-start picks the long bar that blocks the whole timeline; shortest-job picks the short middle bar that displaces both of its neighbors.

Two greedy rules that fail activity selection. Top, earliest-start: a single early-starting job (red) blocks three compatible ones (green). Bottom, shortest-duration: a short job (red) evicts the two longer jobs (green) flanking it.

The rule that works is earliest finish time first: repeatedly pick the compatible activity that finishes soonest.³ The intuition: finishing early frees the hall as soon as possible, leaving the most room for everything that follows. This is the crux of interval scheduling.

Algorithm 1:

\textsc{Greedy-Activity-Select}(s, f)

— choose a max set of compatible activities

1
sort activities so that $f_1 \le f_2 \le \cdots \le f_n$
2
$S \gets \set{1}$
earliest finish is safe
3
$k \gets 1$
last activity added to $S$
4
for $m \gets 2$ to $n$ do
5
if $s_m \ge f_k$ then
$m$ starts after $k$ finishes
6
$S \gets S \cup \set{m}$
7
$k \gets m$
8
return $S$

After the one-time sort by finish time, a single linear scan does the rest: $Θ (n)$ work, for $Θ (n log n)$ total, dominated entirely by the sort. If the finish times arrive already sorted, the selection itself is linear.

Run the scan on the instance above. Sorting the seven activities by finish time gives the order $a (f = 2), b (4), c (5), d (7), e (7), g (10), h (11)$ . The scan keeps a single number, $f_{k}$ , the finish time of the last accepted activity, and admits the next activity exactly when its start is at least $f_{k}$ .

Step	Activity	$[s, f)$	$f_{k}$ before	$s \geq f_{k}$ ?	Action
1	$a$	$[0, 2)$	—	—	accept, $f_{k} \leftarrow 2$
2	$b$	$[1, 4)$	$2$	$1 \geq 2$ ? no	reject
3	$c$	$[3, 5)$	$2$	$3 \geq 2$ ? yes	accept, $f_{k} \leftarrow 5$
4	$d$	$[3, 7)$	$5$	$3 \geq 5$ ? no	reject
5	$e$	$[5, 7)$	$5$	$5 \geq 5$ ? yes	accept, $f_{k} \leftarrow 7$
6	$g$	$[7, 10)$	$7$	$7 \geq 7$ ? yes	accept, $f_{k} \leftarrow 10$
7	$h$	$[7, 11)$	$10$	$7 \geq 10$ ? no	reject

The scan accepts ${a, c, e, g}$ , the four-activity optimum drawn earlier. Each rejection happens because the candidate starts before the hall is free again; each acceptance advances $f_{k}$ to the new, later finish. The figure below traces the same run, marking every activity as accepted (blue) or rejected (struck through) in finish-time order.

Earliest-finish scan on the seven-activity instance. Activities are laid out in finish-time order; blue bars are accepted, gray struck-through bars rejected. The dashed line marks f_k, the finish of the last accepted activity, after each step.

activity_selection.pypython

from typing import NamedTuple, Sequence

class Activity(NamedTuple):
  """
    One activity competing for the shared resource: a half-open interval\n
    [start, finish) on the timeline.\n
  """
  start: float
  finish: float

def select_activities(activities: Sequence[Activity]) -> list[Activity]:
  """
    A maximum-size set of pairwise compatible activities, in finish order.\n
    Two activities are compatible when their intervals do not overlap; the\n
    half-open convention means one ending exactly when another begins is\n
    allowed.\n
  """
  # earliest-finish order is the safe greedy ordering.
  by_finish: list[Activity] = sorted(activities, key=lambda a: a.finish)
  chosen: list[Activity] = []

  # take each activity that starts no earlier than the last one finished;
  # -inf admits the first.
  last_finish: float = float("-inf")
  for activity in by_finish:
    if activity.start >= last_finish:
      chosen.append(activity)
      last_finish = activity.finish

  return chosen

Correctness by the exchange argument

The usual proof technique for greedy algorithms is the exchange argument: take any optimal solution, and show you can transform it, swapping one of its choices for the greedy choice, without making it worse. Since the result is no worse, it is still optimal, and it now agrees with greedy on the first choice. That establishes the greedy-choice property; optimal substructure then finishes the job by induction.

The picture of the swap is the whole argument in one image. Activity $1$ slides in where $j$ was, finishing at least as early, so nothing downstream can break.

Exchange argument swapping activity

j

for the earliest-finishing activity

1

With the greedy-choice lemma in hand, optimal substructure completes the proof by induction.

This two-step shape, (1) an exchange argument for the greedy-choice property and (2) induction via optimal substructure, is the template for every greedy correctness proof in this course, Huffman codes and minimum spanning trees included.

When greedy fails: the 0/1 knapsack

Greedy does not always work; the standard counterexample is the knapsack problem. We have a knapsack of capacity $W$ and $n$ items, item $i$ having weight $w_{i}$ and value $v_{i}$ . We want the most valuable load that fits.

In the fractional knapsack, we may take any fraction of an item. Here greed works perfectly: sort by value density $v_{i} / w_{i}$ , and greedily fill with the densest item, taking a fraction of the last one to top off the capacity exactly.⁴ An exchange argument proves it: any optimal solution that takes less of a denser item and more of a sparser one can be nudged toward the greedy choice without losing value.

Run the density rule concretely. Take capacity $W = 50$ and three items, already listed in decreasing density:

Item	Weight $w_{i}$	Value $v_{i}$	Density $v_{i} / w_{i}$
1	10	60	6.0
2	20	100	5.0
3	30	120	4.0

Greedy takes item $1$ whole (using $10$ of $50$ , value $60$ ), then item $2$ whole (using $30$ of $50$ , value $160$ ), then only a fraction of item $3$ : $20$ of its $30$ units fit, so it takes $\frac{20}{30} = \frac{2}{3}$ of it for $\frac{2}{3} \cdot 120 = 80$ more value. The load is worth $60 + 100 + 80 = 240$ , and the capacity is filled exactly. No division of these items into the sack does better, because every unit of weight we spend goes on the densest value still available — the moment a fraction of item $3$ replaces any unit already taken, the total can only drop.

The fractional-knapsack greedy fill (

W = 50

). Items enter in density order: item 1 (

6.0

) and item 2 (

5.0

) whole, then

\frac{2}{3}

of item 3 (

4.0

) tops off the capacity exactly, for value

240

In the 0/1 knapsack, each item is all-or-nothing: take it whole or leave it. And here the same density rule collapses. Consider $W = 10$ and:

Item	Weight	Value	Density $v / w$
1	6	12	2.0
2	5	9	1.8
3	5	9	1.8

Greedy by density grabs item $1$ (value $12$ , weight $6$ ), then cannot fit either remaining item, since both need weight $5$ but only $4$ is left. It returns value $12$ . Yet items $2$ and $3$ together weigh exactly $10$ and are worth $18$ . Greedy is far from optimal.

Greedy by density on the 0/1 knapsack (

W = 10

). Left, greedy grabs the densest item

1

(

w = 6

), stranding

4

units of unusable capacity for value

12

. Right, the optimum packs items

2

and

3

(

w = 5

each) for value

18

This is where the boundary between greedy and dynamic programming falls. The 0/1 knapsack has optimal substructure but lacks the greedy-choice property, so it needs DP, which considers both alternatives (take item $i$ or skip it) rather than committing to one. The fractional version restores the greedy-choice property because a fraction can always absorb the leftover capacity exactly, leaving no stranded space.

fractional_knapsack.pypython

from typing import NamedTuple, Sequence

class Item(NamedTuple):
  """
    One knapsack item: the value gained by taking the whole of it and the\n
    weight a whole unit costs.\n
  """
  value: float
  weight: float

  @property
  def density(self) -> float:
    """
      Value per unit weight — the greedy ranking key.\n
    """
    return self.value / self.weight

class Portion(NamedTuple):
  """
    How much of one item the greedy load took: the item itself and the\n
    fraction in [0, 1] selected.\n
  """
  item: Item
  fraction: float

def fractional_knapsack(
  items: Sequence[Item],
  capacity: float,
) -> tuple[float, list[Portion]]:
  """
    The maximum achievable value within `capacity`, paired with the fractions\n
    taken of each contributing item. Weights are positive; capacity is\n
    non-negative.\n
  """
  # rank items by value density so the densest fill the budget first.
  by_density: list[Item] = sorted(
    items, key=lambda item: item.density, reverse=True
  )

  total_value: float = 0.0
  portions: list[Portion] = []
  remaining: float = capacity

  for item in by_density:
    if remaining <= 0:
      break

    # the whole item fits; take all of it.
    if item.weight <= remaining:
      total_value += item.value
      remaining -= item.weight
      portions.append(Portion(item, 1.0))

    # only a slice fits; take exactly enough to fill the knapsack.
    else:
      fraction: float = remaining / item.weight
      total_value += item.value * fraction
      portions.append(Portion(item, fraction))
      remaining = 0.0

  return total_value, portions

When is greed good? A glimpse of matroids

For a large family of problems there is a theorem of the form greedy is optimal exactly when…, and its language is the matroid.

A matroid is a pair $(E, I)$ built from a finite ground set $E$ and a family $I$ of independent subsets, satisfying two axioms:

Heredity. If $A \in I$ and $B \subseteq A$ , then $B \in I$ . (Subsets of independent sets are independent.)
Exchange. If $A, B \in I$ and $∣ A ∣ < ∣ B ∣$ , then some element $x \in B ∖ A$ has $A \cup {x} \in I$ . (A smaller independent set can always be grown using an element of a larger one.)

The forests of a graph form a matroid: subsets of a forest are forests, and a smaller forest can always borrow an edge from a larger one without making a cycle.

The graphic matroid's exchange property. Forest

A

(two edges) is smaller than some larger forest

B

; an edge of

B

(blue) joins two of

A

's trees without making a cycle, growing

A

The headline result is due to Rado and Edmonds.

This theorem is why Kruskal's minimum spanning tree algorithm, which greedily adds the cheapest edge that creates no cycle, is correct: it is greedy on the graphic matroid. Activity selection, too, can be cast as greedy on a matroid.

Matroids do not cover every successful greedy algorithm; Huffman coding, our next lesson, falls outside the theory. But they explain a large class and sometimes reduce the correctness question to a checkable condition. We will not develop the theory further here; the exchange axiom that defines it is the same exchange idea used in our correctness proofs.

matroid_greedy.pypython

from __future__ import annotations

from abc import ABC, abstractmethod
from collections.abc import Hashable
from typing import Generic, TypeVar

from graph import Graph
from union_find import UnionFind

GroundElement = TypeVar("GroundElement", bound=Hashable)

class Matroid(ABC, Generic[GroundElement]):
  """
    A matroid over a finite ground set of hashable elements.\n
    Subclasses define which subsets count as independent and how heavy each\n
    element is; the greedy driver below needs nothing more.\n
  """

  @abstractmethod
  def ground_set(self) -> list[GroundElement]:
    """
      Every element of the ground set.\n
    """

  @abstractmethod
  def is_independent(self, subset: set[GroundElement]) -> bool:
    """
      Whether `subset` is an independent set of this matroid.\n
    """

  @abstractmethod
  def weight(self, element: GroundElement) -> float:
    """
      The weight assigned to a single ground element.\n
    """

def greedy_independent_set(
  matroid: Matroid[GroundElement],
) -> list[GroundElement]:
  """
    A maximum-weight independent set, by the Rado-Edmonds greedy rule.\n
    Elements are tried in order of decreasing weight; each is admitted when\n
    the enlarged set stays independent. Negative-weight elements never help,\n
    so they are skipped. Returns the chosen elements in the order admitted.\n
  """
  # try elements heaviest-first; that order is what makes greedy optimal.
  by_weight: list[GroundElement] = sorted(
    matroid.ground_set(),
    key=matroid.weight,
    reverse=True,
  )

  chosen: set[GroundElement] = set()
  ordered: list[GroundElement] = []

  for element in by_weight:
    # negative-weight elements never improve the total, so skip them.
    if matroid.weight(element) <= 0:
      continue

    # admit the element only if independence survives the addition.
    candidate: set[GroundElement] = chosen | {element}
    if matroid.is_independent(candidate):
      chosen.add(element)
      ordered.append(element)

  return ordered

class GraphicMatroid(Matroid[tuple[Hashable, Hashable]]):
  """
    The cycle matroid of an undirected graph.\n
    The ground set is the edges; a subset is independent when it is a forest\n
    (acyclic). Greedy on this matroid, with edge weight as ground weight, is\n
    exactly Kruskal's maximum-weight spanning forest.\n
  """

  def __init__(self, graph: Graph[Hashable]) -> None:
    self._graph: Graph[Hashable] = graph
    self._weights: dict[tuple[Hashable, Hashable], float] = {}
    self._edges: list[tuple[Hashable, Hashable]] = []

    # key each edge by a direction-independent pair of its endpoints.
    for edge in graph.edges():
      key = self._canonical(edge.source.label, edge.target.label)
      self._edges.append(key)
      self._weights[key] = edge.weight

  @staticmethod
  def _canonical(
    first: Hashable,
    second: Hashable,
  ) -> tuple[Hashable, Hashable]:
    """
      A direction-independent key for an undirected edge.\n
    """
    return (first, second) if hash(first) <= hash(second) else (second, first)

  def ground_set(self) -> list[tuple[Hashable, Hashable]]:
    return list(self._edges)

  def weight(self, element: tuple[Hashable, Hashable]) -> float:
    return self._weights[element]

  def is_independent(
    self,
    subset: set[tuple[Hashable, Hashable]],
  ) -> bool:
    """
      Whether `subset` of edges forms a forest (no cycle), tested by union-\n
      find: an edge whose endpoints already share a tree closes a cycle.\n
    """
    # union each edge's endpoints; a failed union means they already shared a
    # tree, so this edge closes a cycle.
    components: UnionFind[Hashable] = UnionFind(self._graph.labels)
    for source_label, target_label in subset:
      if not components.union(source_label, target_label):
        return False

    return True

graph.pypython

from collections.abc import Hashable, Iterator
from typing import Generic, Optional, TypeVar


Label = TypeVar("Label", bound=Hashable)


class Edge(Generic[Label]):
  """
    A directed connection from `source` to `target`, carrying a weight.\n
  """

  def __init__(
    self,
    source: Vertex[Label],
    target: Vertex[Label],
    weight: float = 1.0,
  ) -> None:
    self.source: Vertex[Label] = source
    self.target: Vertex[Label] = target
    self.weight: float = weight

  def __repr__(self) -> str:
    return f"Edge({self.source.label!r} -> {self.target.label!r}, w={self.weight})"


class Vertex(Generic[Label]):
  """
    A graph vertex: a label plus the list of edges leaving it.\n
  """

  def __init__(self, label: Label) -> None:
    self.label: Label = label
    self.outgoing: list[Edge[Label]] = []

  def neighbors(self) -> list[Vertex[Label]]:
    """
      The vertices reachable from this one by a single edge.\n
    """
    return [edge.target for edge in self.outgoing]

  def edge_to(self, label: Label) -> Optional[Edge[Label]]:
    """
      The outgoing edge to the vertex with `label`, or None.\n
    """
    for edge in self.outgoing:
      if edge.target.label == label:
        return edge
    return None

  def __repr__(self) -> str:
    return f"Vertex({self.label!r})"


class Graph(Generic[Label]):
  """
    A graph of Vertex objects linked by Edge objects.\n
    Pass `directed=True` for a digraph; otherwise each `add_edge` inserts\n
    the reverse edge too.\n
  """

  def __init__(self, directed: bool = False) -> None:
    self.directed: bool = directed
    self._vertices: dict[Label, Vertex[Label]] = {}

  def add_vertex(self, label: Label) -> Vertex[Label]:
    """
      Return the vertex for `label`, creating it if it is absent.\n
    """
    # reuse the existing vertex, or mint and register a fresh one.
    vertex = self._vertices.get(label)
    if vertex is None:
      vertex = Vertex(label)
      self._vertices[label] = vertex
    return vertex

  def add_edge(
    self,
    source_label: Label,
    target_label: Label,
    weight: float = 1.0,
  ) -> None:
    """
      Connect two labels (creating either vertex as needed).\n
      Adds the reverse edge as well when the graph is undirected.\n
    """
    source = self.add_vertex(source_label)
    target = self.add_vertex(target_label)

    # link source to target, and mirror it back when undirected.
    source.outgoing.append(Edge(source, target, weight))
    if not self.directed:
      target.outgoing.append(Edge(target, source, weight))

  def vertex(self, label: Label) -> Vertex[Label]:
    """
      The vertex carrying `label` (raises KeyError if absent).\n
    """
    return self._vertices[label]

  @property
  def vertices(self) -> list[Vertex[Label]]:
    """
      Every vertex, in insertion order.\n
    """
    return list(self._vertices.values())

  @property
  def labels(self) -> list[Label]:
    """
      Every vertex label, in insertion order.\n
    """
    return list(self._vertices)

  def edges(self) -> Iterator[Edge[Label]]:
    """
      Each edge once — an undirected edge is yielded a single time.\n
    """
    # track undirected endpoint pairs so each is emitted only once.
    seen: set[frozenset[Label]] = set()

    for vertex in self._vertices.values():
      for edge in vertex.outgoing:
        # skip an undirected edge already yielded from the other endpoint.
        if not self.directed:
          endpoints = frozenset((edge.source.label, edge.target.label))
          if endpoints in seen:
            continue
          seen.add(endpoints)

        yield edge

  def __contains__(self, label: Label) -> bool:
    return label in self._vertices

  def __iter__(self) -> Iterator[Vertex[Label]]:
    return iter(self._vertices.values())

  def __len__(self) -> int:
    return len(self._vertices)

union_find.pypython

from collections.abc import Hashable, Iterable
from typing import Generic, TypeVar, cast


Element = TypeVar("Element", bound=Hashable)


class DisjointSetNode(Generic[Element]):
  """
    One element's node: its value, its parent link, and its rank.\n
    A node is its own parent exactly when it is the root of its set.\n
  """

  def __init__(self, value: Element) -> None:
    self.value: Element = value
    self.parent: DisjointSetNode[Element] = self
    self.rank: int = 0

  def __repr__(self) -> str:
    return f"DisjointSetNode({self.value!r})"


class UnionFind(Generic[Element]):
  """
    A collection of disjoint sets over hashable elements.\n
  """

  def __init__(self, elements: int | Iterable[Element] = 0) -> None:
    """
      Seed the structure. An int `n` creates singletons `0..n-1`;\n
      an iterable creates one singleton node per member.\n
    """
    # a seed count `n` means the elements are 0..n-1 (ints standing in for
    # Element); cast keeps the type checker happy about that substitution.
    members: Iterable[Element] = (
      cast("Iterable[Element]", range(elements))
      if isinstance(elements, int)
      else elements
    )
    # one singleton node per seeded member.
    self._nodes: dict[Element, DisjointSetNode[Element]] = {
      value: DisjointSetNode(value) for value in members
    }
    self.count: int = len(self._nodes)

  def add(self, value: Element) -> None:
    """
      Add `value` as a new singleton set if it is absent.\n
    """
    if value not in self._nodes:
      self._nodes[value] = DisjointSetNode(value)
      self.count += 1

  def _find_root(self, value: Element) -> DisjointSetNode[Element]:
    """
      The root node of `value`'s set, compressing the path on the way.\n
    """
    # first pass: climb parent links to the root of the set.
    node = self._nodes[value]
    root = node
    while root.parent is not root:
      root = root.parent

    # second pass: point every node on the path straight at the root.
    while node.parent is not root:
      node.parent, node = root, node.parent

    return root

  def find(self, value: Element) -> Element:
    """
      The representative value of `value`'s set.\n
    """
    return self._find_root(value).value

  def union(self, first: Element, second: Element) -> bool:
    """
      Merge the sets containing `first` and `second`.\n
      Returns False if they already shared a set.\n
    """
    # already in the same set: nothing to merge.
    first_root = self._find_root(first)
    second_root = self._find_root(second)
    if first_root is second_root:
      return False

    # hang the shorter tree under the taller one.
    if first_root.rank < second_root.rank:
      first_root, second_root = second_root, first_root
    second_root.parent = first_root

    # equal ranks: the merged tree grows one level taller.
    if first_root.rank == second_root.rank:
      first_root.rank += 1

    self.count -= 1
    return True

  def connected(self, first: Element, second: Element) -> bool:
    """
      Whether `first` and `second` belong to the same set.\n
    """
    return self._find_root(first) is self._find_root(second)

A recipe for greedy algorithms

Drawing the standard references together, the workflow is always the same:

Cast the problem as a sequence of choices, where each choice leaves a smaller subproblem of the same kind.
Guess a greedy rule — the locally optimal choice. Beware: the obvious rule is often wrong (recall the failed activity-selection rules).
Prove the greedy-choice property with an exchange argument: any optimal solution can be transformed to contain the greedy choice.
Prove optimal substructure and combine, by induction, into a full proof.

If steps 3 and 4 go through, the result is a correct, usually fast, usually simple algorithm. If they do not, use dynamic programming instead.

When greedy is only approximately optimal

CLRS frames greedy as a route to exact optima, and this lesson has kept to that: activity selection and the fractional knapsack are solved to optimality, or greed is abandoned. But the greedy method also serves as an approximation algorithm — a fast heuristic that is provably close to optimal even when finding the true optimum is intractable.

Set cover and the $ln n$ guarantee. Given a universe of $n$ elements and a family of sets, set cover asks for the fewest sets whose union is everything. It is NP-hard, so no efficient exact algorithm is expected. The natural greedy rule — repeatedly take the set covering the most still-uncovered elements — returns a cover using at most $H_{n} = 1 + \frac{1}{2} + \dots + \frac{1}{n} \leq ln n + 1$ times as many sets as the optimum.⁶ The $(ln n)$ factor is tight: Dinur and Steurer (2014) proved that no polynomial-time algorithm beats $(1 - o (1)) ln n$ unless $P = NP$ , so greedy is essentially the best possible approximation.⁷ The same logarithmic greedy bound governs its twin, vertex cover by the maximum-degree rule, which is why set-cover-shaped problems (facility placement, feature selection, test-suite minimization) are usually attacked greedily first.

Greedy set cover. Each round takes the set covering the most still-uncovered elements (blue); after three greedy picks the universe of

10

is covered. The greedy cover is at most

H_{n} \approx ln n

times the optimum.

Online greedy and the competitive ratio. When the input arrives one piece at a time and each decision is irrevocable — an online problem — greedy is often the only option, and its quality is measured by the competitive ratio, the worst-case ratio of the online cost to the best offline (all-knowing) cost. The canonical case is caching / paging: on a cache miss, which page do you evict? Sleator and Tarjan (1985) showed that any deterministic online eviction policy is at best $k$ -competitive for a cache of size $k$ , and that the greedy-flavored Least-Recently-Used achieves that optimal $k$ , while their competitive analysis framework became the standard one for online algorithms.⁸ Greedy, in short, is both a route to exact optima and the natural — sometimes provably optimal — strategy when the input is revealed online.

Takeaways

A greedy algorithm makes the locally optimal choice at each step and never reconsiders. It is fast and simple, when it is correct.
Correctness needs two properties: the greedy-choice property (some optimal solution contains the greedy choice) and optimal substructure (what remains is the same problem, smaller).
Activity selection by earliest finish time is the canonical win; its proof is the template exchange argument plus induction. Cost: $Θ (n log n)$ , all in the sort.
The 0/1 knapsack is the canonical failure: it lacks the greedy-choice property, so greed strands capacity and needs dynamic programming instead. Its fractional cousin restores the property and yields to greed.
Matroids characterize a broad class where greedy is provably optimal (Kruskal's MST among them), a formalization of the exchange argument itself.

Erickson, Ch. 4 — Greedy Algorithms: the warning that most greedy strategies are wrong and must be proved correct, not merely tested. ↩
CLRS, Ch. 16 — Greedy Algorithms (§16.2): the greedy-choice property as one of the two ingredients licensing a greedy algorithm. ↩
CLRS, Ch. 16 — Greedy Algorithms (§16.1): the activity-selection problem solved by repeatedly choosing the earliest-finishing compatible activity. ↩
Skiena, §1.4 & §5 — Heuristics; Weighted Graph Algorithms: the fractional knapsack solved greedily by value density. ↩
CLRS, Ch. 16 — Greedy Algorithms (§16.4): the Rado–Edmonds theorem that greedy yields a maximum-weight independent set exactly when the structure is a matroid. ↩
CLRS, Ch. 35 — Approximation Algorithms (§35.3): the greedy set-cover algorithm and its proof of an $H_{n} \leq ln n + 1$ approximation ratio. The original analysis is Johnson, D. S. (1974), Approximation algorithms for combinatorial problems, J. Computer and System Sciences 9(3), 256–278. ↩
Dinur, I. & Steurer, D. (2014), Analytical approach to parallel repetition, STOC 2014, 624–633 — establishes that set cover cannot be approximated to better than $(1 - o (1)) ln n$ in polynomial time unless $P = NP$ , matching the greedy bound. ↩
Sleator, D. D. & Tarjan, R. E. (1985), Amortized efficiency of list update and paging rules, Communications of the ACM 28(2), 202–208 — introduces competitive analysis and proves LRU is $k$ -competitive for a size- $k$ cache, the best possible for a deterministic policy. ↩

What makes greedy work

The canonical example: activity selection

Choosing the right greedy rule

Correctness by the exchange argument

When greedy fails: the 0/1 knapsack

When is greed good? A glimpse of matroids

A recipe for greedy algorithms

When greedy is only approximately optimal

Takeaways

Footnotes