Huffman Codes · study.

Suppose we want to store or transmit a file of text drawn from some alphabet of symbols, letters being the obvious case. A fixed-length code assigns every symbol a bit string of the same length: with $6$ symbols we would spend $3$ bits each ( $⌈ log_{2} 6 ⌉ = 3$ ), regardless of how often each symbol appears. But real text is lopsided. In English, e and t are everywhere while q and z are rare. It is wasteful to spend as many bits on z as on e.

A variable-length code exploits this skew: give the frequent symbols short codewords and the rare symbols long ones, so the total bit count drops. Huffman's 1952 algorithm finds the variable-length code that compresses a given file as much as any such code possibly can, and it does so with a simple greedy rule.¹ This lesson is the payoff of the greedy method: a non-obvious algorithm, a short correctness proof, and a result used billions of times a day inside JPEG, MP3, gzip, and PNG.

Prefix-free codes

Variable-length codes carry a hazard. If e is 0 and t is 01, then the stream 001 is ambiguous (is it e e t? e ??) because the codeword for e is a prefix of the codeword for t. To decode unambiguously without separator symbols, we insist on a prefix-free code (also called a prefix code): no codeword is a prefix of any other.²

Prefix-freeness gives instant, unambiguous decoding: read bits left to right, and the moment the bits so far match a codeword, that codeword is the only possible symbol, so emit it and start fresh. Restricting to prefix-free codes costs nothing in compression: for any uniquely decodable code there is a prefix-free code at least as good, so we lose no optimality by considering only these.

Every prefix-free code corresponds to a binary tree. Symbols sit at the leaves; the path from the root to a leaf spells its codeword, taking 0 for a left edge and 1 for a right edge. Because symbols are only at leaves, no codeword can be a prefix of another: a prefix would mean one symbol's leaf lies on the path to another's, impossible when both are leaves. The depth of a leaf is its codeword's length.

A prefix-free code as a binary tree. Symbols sit only at leaves; the root-to-leaf path spells the codeword (

0

left,

1

right). No codeword is a prefix of another because no leaf lies on the path to another.

The compression problem

Let the alphabet be $C$ , and let symbol $c \in C$ occur with frequency $c . freq$ (its count, or its probability) in the file. In a code tree $T$ , let $d_{T} (c)$ be the depth of $c$ 's leaf, the number of bits in its codeword. The total number of bits to encode the whole file is the cost of the tree:

B (T) = c \in C \sum c . freq \cdot d_{T} (c) .

Huffman's algorithm

Huffman's greedy insight is to build the tree bottom-up, starting from the question: which two symbols belong deepest in the tree? The two least frequent ones, since multiplying their long codewords by small frequencies costs little.³ So make the two rarest symbols siblings at the bottom, merge them into a single super-symbol whose frequency is their sum, and repeat on the smaller alphabet. Each merge fuses two nodes into one, so after $n - 1$ merges a single tree remains.

A min-priority queue keyed on frequency makes the two least frequent cheap to extract.

Algorithm 1:

\textsc{Huffman}(C)

— build an optimal prefix-free code tree

1
$n \gets \abs{C}$
2
$Q \gets$ a min-priority queue holding all symbols of $C$ , keyed on $\mathit{freq}$
3
for $i \gets 1$ to $n - 1$ do
4
allocate a new internal node $z$
5
$z.\mathit{left} \gets x \gets$ $\textsc{Extract-Min}(Q)$
rarest remaining
6
$z.\mathit{right} \gets y \gets$ $\textsc{Extract-Min}(Q)$
next rarest
7
$z.\mathit{freq} \gets x.\mathit{freq} + y.\mathit{freq}$
8
call $\textsc{Insert}(Q, z)$
re-insert merged super-symbol
9
return $\textsc{Extract-Min}(Q)$
last node is the root

Each iteration removes two nodes and inserts one, shrinking the queue by one; the single survivor after $n - 1$ rounds is the root of the finished tree. Reading the tree from the root gives every symbol's codeword.

huffman.pypython

import heapq
from collections.abc import Hashable, Iterable, Mapping
from typing import Generic, Optional, TypeVar


Symbol = TypeVar("Symbol", bound=Hashable)


class HuffmanNode(Generic[Symbol]):
  """
    One node of a Huffman code tree.\n
    A leaf carries a `symbol` and its `frequency`; an internal node carries no\n
    symbol, a frequency equal to the sum of its children, and `left`/`right`\n
    links. The tie-breaking `order` gives every node a stable, distinct heap\n
    key so equal frequencies never force a comparison between nodes.\n
  """

  def __init__(
    self,
    frequency: float,
    order: int,
    symbol: Optional[Symbol] = None,
    left: Optional[HuffmanNode[Symbol]] = None,
    right: Optional[HuffmanNode[Symbol]] = None,
  ) -> None:
    self.frequency: float = frequency
    self.order: int = order
    self.symbol: Optional[Symbol] = symbol
    self.left: Optional[HuffmanNode[Symbol]] = left
    self.right: Optional[HuffmanNode[Symbol]] = right

  @property
  def is_leaf(self) -> bool:
    """
      Whether this node holds a symbol (no children).\n
    """
    return self.left is None and self.right is None

  def __lt__(self, other: HuffmanNode[Symbol]) -> bool:
    """
      Order nodes by frequency, breaking ties by insertion order so the\n
      min-heap is fully deterministic and never compares symbols.\n
    """
    if self.frequency != other.frequency:
      return self.frequency < other.frequency
    return self.order < other.order

  def __repr__(self) -> str:
    if self.is_leaf:
      return f"HuffmanNode(symbol={self.symbol!r}, freq={self.frequency})"
    return f"HuffmanNode(internal, freq={self.frequency})"


def build_huffman_tree(
  frequencies: Mapping[Symbol, float],
) -> Optional[HuffmanNode[Symbol]]:
  """
    Build an optimal prefix-free code tree from symbol frequencies.\n
    Seeds a min-priority queue with one leaf per symbol, then performs n - 1\n
    merges: each pops the two least-frequent nodes, joins them under a new\n
    internal node whose frequency is their sum, and pushes the merge back. The\n
    surviving node is the root. Returns None for an empty alphabet.\n
    Runs in O(n log n).\n
  """
  if not frequencies:
    return None

  # seed the heap with one leaf per symbol.
  counter: int = 0
  heap: list[HuffmanNode[Symbol]] = []
  for symbol, frequency in frequencies.items():
    heapq.heappush(heap, HuffmanNode(frequency, counter, symbol=symbol))
    counter += 1

  # n - 1 merges fuse the leaves into a single tree.
  while len(heap) > 1:
    # pop the two least-frequent nodes.
    rarest: HuffmanNode[Symbol] = heapq.heappop(heap)
    next_rarest: HuffmanNode[Symbol] = heapq.heappop(heap)

    # join them under a new internal node and push it back.
    merged: HuffmanNode[Symbol] = HuffmanNode(
      rarest.frequency + next_rarest.frequency,
      counter,
      left=rarest,
      right=next_rarest,
    )
    counter += 1
    heapq.heappush(heap, merged)

  return heap[0]

Building a Huffman tree by hand

Take a six-symbol alphabet with these frequencies (in thousands of occurrences), the classic CLRS example:

Symbol	`a`	`b`	`c`	`d`	`e`	`f`
Frequency	45	13	12	16	9	5

We repeatedly merge the two smallest frequencies:

Merge f $(5)$ and e $(9)$ → node $(14)$ .
Merge c $(12)$ and b $(13)$ → node $(25)$ .
Merge $(14)$ and d $(16)$ → node $(30)$ .
Merge $(25)$ and $(30)$ → node $(55)$ .
Merge a $(45)$ and $(55)$ → root $(100)$ .

Each step extracts the two smallest weights and re-inserts their sum, so the queue loses one element per round until a single root remains.

The five priority-queue merges that build the Huffman tree. Each step extracts the two least-frequent nodes and inserts their sum; the queue shrinks by one until a single root of weight

100

remains.

The resulting tree, with left edges labeled 0 and right edges 1, is:

Huffman code tree for the six-symbol example with edges labeled

0

and

1

Reading root-to-leaf gives the codewords:

Symbol	`a`	`b`	`c`	`d`	`e`	`f`
Codeword	`0`	`101`	`100`	`111`	`1101`	`1100`

The frequent a gets a single bit; the rare e and f get four. The cost is

B (T) = 45 \cdot 1 + 13 \cdot 3 + 12 \cdot 3 + 16 \cdot 3 + 9 \cdot 4 + 5 \cdot 4 = 224

thousand bits. A fixed-length $3$ -bit code would spend $3 \cdot (45 + 13 + 12 + 16 + 9 + 5) = 300$ thousand bits, so Huffman saves about $25%$ , and no prefix-free code does better.

Encoding concatenates codewords with nothing between them. The word face becomes 1100 0 100 1101, the eleven-bit stream 11000100 1101. Decoding runs the bits back through the tree: start at the root, turn left on 0 and right on 1, and the instant a leaf is reached emit its symbol and jump back to the root. Prefix-freeness is what makes this unambiguous — a leaf is reached exactly when a whole codeword has been consumed, never mid-codeword.

Decoding the stream 110001001101 with the example code. Each root-to-leaf descent consumes one codeword; the bits partition uniquely into f, a, c, e with no separators.

huffman.pypython

from collections import Counter
from collections.abc import Hashable, Iterable, Mapping
from typing import Generic, Optional, TypeVar


Symbol = TypeVar("Symbol", bound=Hashable)


class HuffmanNode(Generic[Symbol]):
  """
    One node of a Huffman code tree.\n
    A leaf carries a `symbol` and its `frequency`; an internal node carries no\n
    symbol, a frequency equal to the sum of its children, and `left`/`right`\n
    links. The tie-breaking `order` gives every node a stable, distinct heap\n
    key so equal frequencies never force a comparison between nodes.\n
  """

  def __init__(
    self,
    frequency: float,
    order: int,
    symbol: Optional[Symbol] = None,
    left: Optional[HuffmanNode[Symbol]] = None,
    right: Optional[HuffmanNode[Symbol]] = None,
  ) -> None:
    self.frequency: float = frequency
    self.order: int = order
    self.symbol: Optional[Symbol] = symbol
    self.left: Optional[HuffmanNode[Symbol]] = left
    self.right: Optional[HuffmanNode[Symbol]] = right

  @property
  def is_leaf(self) -> bool:
    """
      Whether this node holds a symbol (no children).\n
    """
    return self.left is None and self.right is None

  def __lt__(self, other: HuffmanNode[Symbol]) -> bool:
    """
      Order nodes by frequency, breaking ties by insertion order so the\n
      min-heap is fully deterministic and never compares symbols.\n
    """
    if self.frequency != other.frequency:
      return self.frequency < other.frequency
    return self.order < other.order

  def __repr__(self) -> str:
    if self.is_leaf:
      return f"HuffmanNode(symbol={self.symbol!r}, freq={self.frequency})"
    return f"HuffmanNode(internal, freq={self.frequency})"


def build_codebook(
  root: Optional[HuffmanNode[Symbol]],
) -> dict[Symbol, str]:
  """
    Map each symbol to its codeword by walking the tree, taking '0' on a left\n
    edge and '1' on a right edge. A single-symbol alphabet has no edges, so its\n
    lone symbol gets the one-bit codeword '0' (a valid, decodable code).\n
  """
  codebook: dict[Symbol, str] = {}
  if root is None:
    return codebook

  if root.is_leaf:
    # degenerate tree: assign one bit so the code is still usable.
    assert root.symbol is not None
    codebook[root.symbol] = "0"
    return codebook

  def walk(node: HuffmanNode[Symbol], codeword: str) -> None:
    # a leaf ends the path: record its codeword.
    if node.is_leaf:
      assert node.symbol is not None
      codebook[node.symbol] = codeword
      return

    # descend, appending 0 for a left edge and 1 for a right edge.
    if node.left is not None:
      walk(node.left, codeword + "0")
    if node.right is not None:
      walk(node.right, codeword + "1")

  walk(root, "")
  return codebook


class HuffmanCode(Generic[Symbol]):
  """
    An optimal prefix-free code over a fixed alphabet.\n
    Built from explicit frequencies or learned from a sample sequence, it\n
    exposes the code tree, the per-symbol codebook, encode/decode, and the\n
    total cost B(T) of the tree.\n
  """

  def __init__(self, frequencies: Mapping[Symbol, float]) -> None:
    self.frequencies: dict[Symbol, float] = dict(frequencies)
    self.root: Optional[HuffmanNode[Symbol]] = build_huffman_tree(frequencies)
    self.codebook: dict[Symbol, str] = build_codebook(self.root)

  @classmethod
  def from_data(cls, data: Iterable[Symbol]) -> HuffmanCode[Symbol]:
    """
      Build a code whose frequencies are the symbol counts in `data`.\n
    """
    return cls(Counter(data))

  def encode(self, data: Iterable[Symbol]) -> str:
    """
      Concatenate the codewords of `data` into one bit string.\n
      Every symbol must belong to the code's alphabet.\n
    """
    # look up each symbol's codeword, rejecting any outside the alphabet.
    pieces: list[str] = []
    for symbol in data:
      if symbol not in self.codebook:
        raise KeyError(f"symbol {symbol!r} is not in the code's alphabet")
      pieces.append(self.codebook[symbol])

    return "".join(pieces)

  def decode(self, bits: str) -> list[Symbol]:
    """
      Recover the original symbols from a bit string produced by `encode`.\n
      Reads bits left to right, descending the tree until a leaf is reached,\n
      then emits that symbol and restarts at the root.\n
    """
    symbols: list[Symbol] = []
    if self.root is None:
      if bits:
        raise ValueError("cannot decode bits with an empty code")
      return symbols

    # a one-symbol code has no internal nodes: each bit is one symbol.
    if self.root.is_leaf:
      assert self.root.symbol is not None
      return [self.root.symbol for _ in bits]

    # descend one edge per bit; at each leaf emit a symbol and restart.
    node: Optional[HuffmanNode[Symbol]] = self.root
    for bit in bits:
      node = node.left if bit == "0" else node.right
      if node is None:
        raise ValueError("invalid bit in encoded stream")

      if node.is_leaf:
        assert node.symbol is not None
        symbols.append(node.symbol)
        node = self.root

    # a clean decode lands back at the root with no leftover bits.
    if node is not self.root:
      raise ValueError("encoded stream ended mid-codeword")
    return symbols

  def cost(self) -> float:
    """
      The total cost B(T) = sum of freq * depth over all leaves: the number of\n
      bits to encode a file with these frequencies under this code.\n
    """
    return sum(
      self.frequencies[symbol] * len(codeword)
      for symbol, codeword in self.codebook.items()
    )

Why Huffman is optimal

Huffman is a greedy algorithm, so its proof follows the template from the previous lesson exactly: a greedy-choice property proved by an exchange argument, then optimal substructure to close the induction.⁴

The greedy choice is safe

Proof (exchange argument). Let $T$ be any optimal tree. Let $a$ and $b$ be two sibling leaves at the deepest level of $T$ (a full tree's deepest leaves come in sibling pairs). Without loss of generality assume $a . freq \leq b . freq$ and $x . freq \leq y . freq$ . Since $x$ and $y$ are globally least frequent, $x . freq \leq a . freq$ and $y . freq \leq b . freq$ .

Form $T^{'}$ by swapping $x$ with $a$ , and $T^{''}$ by then swapping $y$ with $b$ . We show no swap increases the cost. Moving $x$ down to depth $d_{T} (a)$ and $a$ up to depth $d_{T} (x)$ changes the cost by

B (T) - B (T^{'}) = (a . freq - x . freq) (d_{T} (a) - d_{T} (x)) \geq 0,

because $a$ is at least as frequent as $x$ ( $a . freq - x . freq \geq 0$ ) and $a$ is at least as deep as $x$ ( $d_{T} (a) - d_{T} (x) \geq 0$ ). So $B (T^{'}) \leq B (T)$ . The same argument gives $B (T^{''}) \leq B (T^{'})$ . Since $T$ was optimal, $T^{''}$ is optimal too, and in $T^{''}$ the symbols $x$ and $y$ are sibling leaves at maximum depth. $□$

The intuition is the exchange argument in one sentence: the rarest symbols belong deepest, so pushing them down and pulling frequent symbols up can never cost more.

The two swaps are easiest to see side by side. In any optimal tree $T$ the deepest sibling pair holds some leaves $a, b$ ; the globally rarest symbols $x, y$ may sit higher up. Exchanging $x$ with $a$ and $y$ with $b$ sends the rare symbols to the bottom and lifts the more frequent ones — and the cost only drops, because each moved-down leaf is rarer and each moved-up leaf is more frequent.

The greedy-choice exchange for Huffman. Left, an optimal tree

T

with deepest siblings

a, b

and the rarest symbols

x, y

sitting higher. Right, after swapping

x \leftrightarrow a

and

y \leftrightarrow b

the rarest symbols are deepest; cost cannot rise since rarer leaves moved down and more frequent leaves moved up.

Optimal substructure

Together the two lemmas give the theorem by induction on $∣ C ∣$ .

Running time

The cost is dominated by the priority-queue operations. Building the initial min-heap from $n$ symbols takes $O (n)$ . The loop runs $n - 1$ times, and each iteration does two $Extract-Min$ s and one Insert, each $O (log n)$ on a binary heap. Hence

T (n) = O (n) + (n - 1) \cdot O (log n) = O (n log n) .

If the frequencies arrive already sorted, two simple FIFO queues replace the heap (one of original leaves, one of merged nodes, both kept in nondecreasing frequency), and each extract two smallest is $O (1)$ , giving an $O (n)$ algorithm. The $O (n log n)$ bound, like activity selection's, is really the cost of getting the symbols into sorted order.

huffman.pypython

from collections.abc import Hashable, Iterable, Mapping
from typing import Generic, Optional, TypeVar


Symbol = TypeVar("Symbol", bound=Hashable)


class HuffmanNode(Generic[Symbol]):
  """
    One node of a Huffman code tree.\n
    A leaf carries a `symbol` and its `frequency`; an internal node carries no\n
    symbol, a frequency equal to the sum of its children, and `left`/`right`\n
    links. The tie-breaking `order` gives every node a stable, distinct heap\n
    key so equal frequencies never force a comparison between nodes.\n
  """

  def __init__(
    self,
    frequency: float,
    order: int,
    symbol: Optional[Symbol] = None,
    left: Optional[HuffmanNode[Symbol]] = None,
    right: Optional[HuffmanNode[Symbol]] = None,
  ) -> None:
    self.frequency: float = frequency
    self.order: int = order
    self.symbol: Optional[Symbol] = symbol
    self.left: Optional[HuffmanNode[Symbol]] = left
    self.right: Optional[HuffmanNode[Symbol]] = right

  @property
  def is_leaf(self) -> bool:
    """
      Whether this node holds a symbol (no children).\n
    """
    return self.left is None and self.right is None

  def __lt__(self, other: HuffmanNode[Symbol]) -> bool:
    """
      Order nodes by frequency, breaking ties by insertion order so the\n
      min-heap is fully deterministic and never compares symbols.\n
    """
    if self.frequency != other.frequency:
      return self.frequency < other.frequency
    return self.order < other.order

  def __repr__(self) -> str:
    if self.is_leaf:
      return f"HuffmanNode(symbol={self.symbol!r}, freq={self.frequency})"
    return f"HuffmanNode(internal, freq={self.frequency})"


def two_queue_huffman_tree(
  sorted_symbols: list[tuple[Symbol, float]],
) -> Optional[HuffmanNode[Symbol]]:
  """
    Build the Huffman tree in O(n) when frequencies arrive pre-sorted.\n
    With the leaves already in nondecreasing frequency, two FIFO queues — one\n
    of original leaves, one of merged internal nodes — both stay sorted, so the\n
    two least-frequent nodes are always at the two queue fronts and each\n
    "extract two smallest" is O(1). `sorted_symbols` must be in nondecreasing\n
    frequency order.\n
  """
  if not sorted_symbols:
    return None

  from collections import deque

  # one FIFO of original leaves (already sorted), one of merged internals.
  counter: int = 0
  leaves: deque[HuffmanNode[Symbol]] = deque()
  for symbol, frequency in sorted_symbols:
    leaves.append(HuffmanNode(frequency, counter, symbol=symbol))
    counter += 1
  merges: deque[HuffmanNode[Symbol]] = deque()

  def take_smallest() -> HuffmanNode[Symbol]:
    """
      Pop the smaller of the two queue fronts, preferring leaves on a tie so\n
      the construction stays deterministic.\n
    """
    # if one queue is empty, the other front is the smallest.
    if not merges:
      return leaves.popleft()
    if not leaves:
      return merges.popleft()

    # otherwise compare fronts, preferring leaves on a tie.
    if leaves[0].frequency <= merges[0].frequency:
      return leaves.popleft()
    return merges.popleft()

  while len(leaves) + len(merges) > 1:
    # take the two least-frequent nodes off the queue fronts.
    first: HuffmanNode[Symbol] = take_smallest()
    second: HuffmanNode[Symbol] = take_smallest()

    # merge them and enqueue the new internal node.
    merged: HuffmanNode[Symbol] = HuffmanNode(
      first.frequency + second.frequency,
      counter,
      left=first,
      right=second,
    )
    counter += 1
    merges.append(merged)

  return merges[0] if merges else leaves[0]

Entropy, arithmetic coding, and the whole-bit floor

Huffman coding is optimal among codes that assign each symbol a whole number of bits independently. This section places it against the theoretical floor and against the codes that go beyond it.

The entropy bound. Shannon's source coding theorem (1948) sets the floor: no uniquely decodable code can average fewer than the entropy $H = \sum_{c} p_{c} log_{2} (1/ p_{c})$ bits per symbol, and a Huffman code always lands within one bit of it, $H \leq \overset{ˉ}{L}_{Huffman} < H + 1$ .⁵ The example above illustrates the gap: its entropy is about $2.24$ bits per symbol while Huffman spends $\overset{ˉ}{L} = 224/100 = 2.24$ — essentially on the floor, because the frequencies are close to powers of $\frac{1}{2}$ . The one-bit slack becomes visible only on skewed sources.

The whole-bit floor. When a symbol's ideal codeword length $log_{2} (1/ p)$ is fractional — for a symbol of probability $0.9$ , ideally $0.15$ bits — Huffman must round up to a full bit, wasting the difference. Arithmetic coding (Rissanen & Langdon, 1979) sidesteps the integer-bit floor by encoding the entire message as a single fraction in $[0, 1)$ , so a symbol can cost a fractional number of bits and the total approaches $H$ arbitrarily closely.⁶ Modern asymmetric numeral systems (Duda, 2009) match arithmetic coding's ratio at Huffman-like speed and are now used in Zstandard, LZFSE, and JPEG XL.⁷

Where Huffman sits. The entropy

H

is the hard floor (Shannon); Huffman lands within one bit of it, tight on near-dyadic sources and slack on skewed ones; arithmetic coding / ANS close the remaining gap toward

H

Where it is still used. Huffman's simplicity, speed, and provable optimality within its class keep it embedded in DEFLATE (gzip, PNG, ZIP), JPEG, and MP3 to this day, usually as the final entropy-coding stage after a modeling transform. Two engineering variants matter in practice: canonical Huffman codes store the codebook as just the per-symbol lengths (the codewords are then reconstructed by a fixed rule), shrinking the header DEFLATE must transmit; and length-limited Huffman (the Package-Merge algorithm of Larmore & Hirschberg, 1990) caps the maximum codeword length so decode tables stay small, at a tiny cost in ratio.⁸ Huffman remains the textbook proof that a greedy algorithm, properly justified, can be exactly optimal, not merely a good heuristic.

Takeaways

A prefix-free code lets a stream decode unambiguously — it is a binary tree with symbols at the leaves, codeword length = leaf depth.
The goal is to minimize $B (T) = \sum_{c} c . freq \cdot d_{T} (c)$ ; optimal trees are full.
Huffman's algorithm greedily merges the two least-frequent nodes via a min-priority queue, $n - 1$ times, building the tree bottom-up.
Optimality follows the greedy template: an exchange argument shows the rarest symbols belong deepest (greedy choice), and optimal substructure closes the induction.
Running time is $O (n log n)$ , the cost of the heap operations, dropping to $O (n)$ when frequencies are pre-sorted.

Skiena, §5 — Data Compression: Huffman's greedy construction of the optimal variable-length code for a given file. ↩
CLRS, Ch. 16 — Greedy Algorithms (§16.3): prefix-free codes and their representation as binary trees with symbols at the leaves. ↩
CLRS, Ch. 16 — Greedy Algorithms (§16.3): the greedy rule of repeatedly merging the two least-frequent symbols, implemented with a min-priority queue. ↩
Erickson, Ch. 4 — Greedy Algorithms (Huffman Codes): the optimality proof via greedy-choice exchange plus optimal substructure. ↩
Shannon, C. E. (1948), A mathematical theory of communication, Bell System Technical Journal 27, 379–423 & 623–656 — entropy $H$ as the lower bound on average code length; a Huffman code satisfies $H \leq \overset{ˉ}{L} < H + 1$ . ↩
Rissanen, J. & Langdon, G. G. (1979), Arithmetic coding, IBM Journal of Research and Development 23(2), 149–162 — encoding a whole message as one interval, escaping Huffman's whole-bit-per-symbol floor. ↩
Duda, J. (2009), Asymmetric numeral systems, arXiv:0902.0271 — near-entropy compression at table-lookup speed, now used in Zstandard, LZFSE, and JPEG XL. ↩
Larmore, L. L. & Hirschberg, D. S. (1990), A fast algorithm for optimal length-limited Huffman codes, Journal of the ACM 37(3), 464–473 — the Package-Merge algorithm building optimal Huffman codes under a maximum-length constraint. ↩

Prefix-free codes

The compression problem

Huffman's algorithm

Building a Huffman tree by hand

Why Huffman is optimal

The greedy choice is safe

Optimal substructure

Running time

Entropy, arithmetic coding, and the whole-bit floor

Takeaways

Footnotes