Sieves & Factorization

The previous lesson handed us a fast test for whether a single number is prime. Many problems instead need the primes en masse: every prime below $n$ , or the factorization of each of many queries, and testing each number independently wastes the structure shared across them. A sieve inverts the computation: rather than test one number at a time, it strikes out the multiples of each prime, eliminating the composites collectively. The result is a precomputed table over $1.. n$ that answers is $k$ prime? in $O (1)$ and, with one more field, factors any $x \leq n$ in $O (log x)$ .

The sieve of Eratosthenes

The idea is ancient and simple. Write the integers $2, 3, \dots, n$ . The smallest unmarked number, $2$ , is prime; cross out all of its multiples $4, 6, 8, \dots$ . The next still-unmarked number, $3$ , is prime; cross out $6, 9, 12, \dots$ . Repeat. Whenever we reach an unmarked number, no smaller prime struck it, so it has no smaller divisor, hence it is prime, and we strike its multiples in turn. When we are done, the unmarked numbers are exactly the primes.

In the grid below, $1..100$ is laid out ten per row: composites are shaded out (grey), $1$ is left blank, and the $25$ survivors $2, 3, 5, 7, 11, \dots, 97$ — the primes — are highlighted.

The sieve over

1..100

: composites shaded out,

1

blank, the 25 surviving primes highlighted.

Two optimizations make the sieve fast and are worth stating precisely.

Algorithm:

\textsc{Sieve}(n)

— mark composites, return the prime indicator array

1
$P[0..n] \gets \text{true}$ ; $P[0] \gets \text{false}$ ; $P[1] \gets \text{false}$
2
for $c \gets 2$ to $\lfloor\sqrt{n}\rfloor$ do
3
if $P[c]$ then
if candidate is prime
4
$f \gets c^2$
skip factors of smaller primes
5
while $f \le n$ do
6
$P[f] \gets \text{false}$
sieve out the factor
7
$f \gets f + c$
stride by prime
8
return $P$

sieve_of_eratosthenes.pypython

def prime_sieve(limit: int) -> list[bool]:
  """
    The prime-indicator array over `0..limit`: `is_prime[k]` is True exactly\n
    when k is prime. Indices 0 and 1 are never prime.\n
  """
  if limit < 0:
    raise ValueError("limit must be non-negative")

  # 0 and 1 are never prime; index 0 always exists here, 1 only when limit >= 1.
  is_prime: list[bool] = [True for _ in range(limit + 1)]
  is_prime[0] = False
  if limit >= 1:
    is_prime[1] = False

  candidate: int = 2
  while candidate * candidate <= limit:
    if is_prime[candidate]:

      # smaller multiples of `candidate` already fell to smaller primes,
      # so the first new composite is candidate^2; stride by `candidate`.
      multiple: int = candidate * candidate
      while multiple <= limit:
        is_prime[multiple] = False
        multiple += candidate
    candidate += 1
  return is_prime

def primes_up_to(limit: int) -> list[int]:
  """
    The list of primes `<= limit`, ascending.\n
  """
  is_prime: list[bool] = prime_sieve(limit)
  return [number for number in range(2, limit + 1) if is_prime[number]]

def count_primes(limit: int) -> int:
  """
    The number of primes strictly less than `limit` (the `Count Primes`\n
    convention: primes in `[2, limit)`).\n
  """
  if limit <= 2:
    return 0
  return sum(prime_sieve(limit - 1))

Why it is $O (n log log n)$

The work is dominated by the inner loop, which for each prime $p \leq n$ strikes $⌊ n / p ⌋$ multiples. Summing over primes,

p \leq n \sum \frac{n}{p} = n p \leq n \sum \frac{1}{p} .

The naive worry is that $\sum 1/ p$ behaves like the harmonic series $\sum 1/ k = Θ (log n)$ , which would give $O (n log n)$ . But the sum runs over primes only, which are sparse, and a classical theorem of Mertens says the reciprocal sum of primes grows far more slowly:

Hence the total work is $n (ln ln n + O (1)) = O (n log log n)$ .¹ The $log log n$ factor is, for all practical $n$ , a small constant (under $5$ for $n = 1 0^{9}$ ), so the sieve is effectively linear. Space is $O (n)$ for the array (one bit per number if packed). Starting at $p^{2}$ rather than $2 p$ does not change the asymptotics but roughly halves the constant.

The linear sieve and smallest prime factors

The Eratosthenes sieve strikes some composites more than once: $12$ is hit by $2$ (as $2 \cdot 6$ ) and by $3$ (as $3 \cdot 4$ ). That redundancy accounts for the $log log n$ factor. A linear sieve removes it by guaranteeing that every composite is crossed out exactly once, by its smallest prime factor (SPF). As a bonus it records that smallest prime factor, which is the key to fast factorization below.

Maintain a growing list of primes found so far. For each $i$ from $2$ to $n$ , and for each known prime $p$ in increasing order, mark the product $i \cdot p$ as composite with smallest prime factor $p$ . The subtle line is the termination: as soon as $p$ divides $i$ , we break.

Algorithm:

\textsc{LinearSieve}(n)

— compute

\text{spf}[x]

for every

x \le n

1
$\text{spf}[0..n] \gets 0$ ; $\text{primes} \gets [\,]$
2
for $i \gets 2$ to $n$ do
3
if $\text{spf}[i] = 0$ then
$i$ is prime
4
$\text{spf}[i] \gets i$ ; append $i$ to $\text{primes}$
5
for each $p$ in $\text{primes}$ do
6
if $p > \text{spf}[i]$ or $i \cdot p > n$ then break
7
$\text{spf}[i \cdot p] \gets p$
8
return $\text{spf}, \text{primes}$

linear_sieve.pypython

from typing import NamedTuple

class SieveResult(NamedTuple):
  """
    The output of the linear sieve: the smallest-prime-factor table over\n
    `0..limit` (0 for 0 and 1, the prime itself for a prime), and the list\n
    of primes found in ascending order.\n
  """
  smallest_prime_factor: list[int]
  primes: list[int]

def linear_sieve(limit: int) -> SieveResult:
  """
    Compute `smallest_prime_factor[x]` for every `x <= limit` in linear time,\n
    alongside the ascending list of primes up to `limit`.\n
  """
  if limit < 0:
    raise ValueError("limit must be non-negative")

  smallest_prime_factor: list[int] = [0 for _ in range(limit + 1)]
  primes: list[int] = []

  for number in range(2, limit + 1):
    if smallest_prime_factor[number] == 0:

      # nothing smaller struck it, so `number` is prime.
      smallest_prime_factor[number] = number
      primes.append(number)

    for prime in primes:

      # only multiples with `prime` as their smallest factor belong here;
      # stop once `prime` exceeds spf[number] or the product overflows.
      if prime > smallest_prime_factor[number] or number * prime > limit:
        break
      smallest_prime_factor[number * prime] = prime

  return SieveResult(smallest_prime_factor, primes)

Each composite $m \leq n$ is written once, when $i = m / spf (m)$ and $p = spf (m)$ , so the total number of marking operations equals the number of composites: the running time is $Θ (n)$ , with $O (n)$ space.² The classic Count Primes problem is solved by either sieve; the linear sieve is the right tool whenever you also need per-number factor data downstream.

The payoff is the $spf$ table itself: every prime maps to itself, every composite to its smallest prime factor, each entry written exactly once. The composite $12$ , for instance, is struck only when $i = 6, p = 2$ , never again by the larger prime $3$ .

Linear sieve: each composite carries its smallest prime factor

spf [x]

, written once

Factorization

With a precomputed SPF table: $O (log x)$

Given the $spf$ array from the linear sieve, any $x \leq n$ factors by peeling off its smallest prime factor and dividing it out, repeatedly, until $1$ remains.

Algorithm:

\textsc{Factor}(x)

— full prime factorization of

x \le n

via

\text{spf}

1
$F \gets \{\}$
map prime $\to$ exponent
2
while $x > 1$ do
3
$p \gets \text{spf}[x]$
4
while $x \bmod p = 0$ do
5
$x \gets x / p$ ; $F[p] \gets F[p] + 1$
6
return $F$

spf_factorization.pypython

def factorize_with_spf(value: int, smallest_prime_factor: list[int]) -> dict[int, int]:
  """
    The prime factorization of `value` as a map from prime to exponent,\n
    using a precomputed `smallest_prime_factor` table that covers `value`.\n
    `value` must satisfy `2 <= value < len(smallest_prime_factor)`.\n
  """
  if value < 2:
    return {}
  if value >= len(smallest_prime_factor):
    raise ValueError("value exceeds the smallest-prime-factor table")

  factorization: dict[int, int] = {}
  remaining: int = value

  # read off spf[remaining], then strip every copy before moving on.
  while remaining > 1:
    prime: int = smallest_prime_factor[remaining]
    while remaining % prime == 0:
      remaining //= prime
      factorization[prime] = factorization.get(prime, 0) + 1
  return factorization

def distinct_prime_factors(value: int, smallest_prime_factor: list[int]) -> list[int]:
  """
    The distinct prime factors of `value`, ascending, via the SPF table.\n
  """
  return sorted(factorize_with_spf(value, smallest_prime_factor))

Each division by a prime $p \geq 2$ at least halves $x$ , so the outer process runs at most $log_{2} x$ times: factorization is $O (log x)$ once the table is built. This is what makes problems like Distinct Prime Factors of Product of Array and Smallest Value After Replacing With Sum of Prime Factors tractable across many values: sieve once, then factor each query in logarithmic time.

Divide by

spf [x]

until

1

; the chain of factors collects in the accent box

Without preprocessing: trial division and beyond

When $x$ is a one-off, or larger than any sieve we can afford, fall back to trial division: try each candidate divisor $d = 2, 3, 4, \dots$ up to $x$ , dividing it out whenever it divides. The $x$ bound is the same observation as in the primality lesson: if $x = ab$ with $a \leq b$ then $a \leq x$ , so the smallest nontrivial factor appears by $x$ ; any factor left after the loop is the final large prime.

Algorithm:

\textsc{TrialFactor}(x)

— factor a single

x

O(\sqrt{x})

1
$F \gets \{\}$ ; $d \gets 2$
2
while $d \cdot d \le x$ do
3
while $x \bmod d = 0$ do
4
$x \gets x / d$ ; $F[d] \gets F[d] + 1$
5
$d \gets d + 1$
6
if $x > 1$ then $F[x] \gets F[x] + 1$
leftover prime $> \sqrt{x}$
7
return $F$

trial_division.pypython

def trial_factor(value: int) -> dict[int, int]:
  """
    The prime factorization of `value >= 1` as a map from prime to exponent,\n
    by trial division up to sqrt(value). The empty map represents 1.\n
  """
  if value < 1:
    raise ValueError("value must be a positive integer")

  factorization: dict[int, int] = {}
  remaining: int = value

  divisor: int = 2
  while divisor * divisor <= remaining:

    # strip every copy of `divisor` before advancing.
    while remaining % divisor == 0:
      remaining //= divisor
      factorization[divisor] = factorization.get(divisor, 0) + 1
    divisor += 1

  # a leftover above 1 is a prime larger than sqrt(value).
  if remaining > 1:
    factorization[remaining] = factorization.get(remaining, 0) + 1
  return factorization

This costs $O (x)$ . For genuinely large $x$ (say $64$ -bit and beyond), $x$ is too slow, and one reaches for Pollard's rho, a randomized factoring algorithm that finds a nontrivial factor in expected $O (x^{1/4})$ time via cycle-detection on a pseudorandom map, paired with the Miller–Rabin primality test to know when a factor is itself prime and recursion can stop.³

The name comes from the shape of the orbit. Iterating $x \mapsto x^{2} + 1 mod n$ from a seed eventually repeats, so the trajectory runs down a tail and then loops a cycle — drawn out, it looks like the Greek letter $ρ$ . On $n = 91 = 7 \cdot 13$ the seed $2$ feeds a four-step cycle; because the cycle's residues modulo $7$ collide before they collide modulo $91$ , a difference $x_{i} - x_{j}$ shares the factor $7$ with $n$ , which $gcd (x_{i} - x_{j}, n)$ then extracts.

Pollard's rho on

n = 91

with

f (x) = x^{2} + 1

: the orbit of

2

forms a tail into a cycle — the

ρ

shape

pollard_rho.pypython

import random
from math import gcd

# deterministic Miller-Rabin witnesses covering all 64-bit integers.
_WITNESSES: tuple[int, ...] = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37)

def is_prime(number: int) -> bool:
  """
    Whether `number` is prime, by deterministic Miller-Rabin. The fixed\n
    witness set is exact for every `number < 3.3 * 10^24`.\n
  """
  if number < 2:
    return False

  # settle small primes directly and reject their larger multiples.
  for small_prime in _WITNESSES:
    if number == small_prime:
      return True
    if number % small_prime == 0:
      return False

  # write number - 1 = odd_part * 2^power_of_two.
  odd_part: int = number - 1
  power_of_two: int = 0
  while odd_part % 2 == 0:
    odd_part //= 2
    power_of_two += 1

  # each witness must reach -1 or already sit at +-1, else `number` is composite.
  for witness in _WITNESSES:
    residue: int = pow(witness, odd_part, number)
    if residue == 1 or residue == number - 1:
      continue

    # square up to `power_of_two - 1` times, hunting for -1.
    is_composite: bool = True
    for _ in range(power_of_two - 1):
      residue = (residue * residue) % number
      if residue == number - 1:
        is_composite = False
        break

    if is_composite:
      return False
  return True

def _pollard_rho_factor(number: int) -> int:
  """
    A single nontrivial factor of composite `number` via Pollard's rho,\n
    retrying with fresh parameters until the cycle yields a divisor.\n
  """
  if number % 2 == 0:
    return 2

  # retry with fresh c and start until a proper divisor falls out.
  while True:
    increment: int = random.randrange(1, number)
    slow: int = random.randrange(2, number)
    fast: int = slow
    divisor: int = 1

    # Floyd-style: advance `slow` once and `fast` twice per step, tracking
    # gcd of their difference with `number`.
    while divisor == 1:
      slow = (slow * slow + increment) % number
      fast = (fast * fast + increment) % number
      fast = (fast * fast + increment) % number
      divisor = gcd(abs(slow - fast), number)

    if divisor != number:
      return divisor

def factorize(number: int) -> dict[int, int]:
  """
    The prime factorization of `number >= 1` as a map from prime to exponent,\n
    recursively splitting composites with Pollard's rho until every part is\n
    prime by Miller-Rabin. The empty map represents 1.\n
  """
  if number < 1:
    raise ValueError("number must be a positive integer")

  factorization: dict[int, int] = {}

  def split(value: int) -> None:
    if value == 1:
      return

    # a prime part contributes one exponent and stops the recursion.
    if is_prime(value):
      factorization[value] = factorization.get(value, 0) + 1
      return

    # otherwise carve off a factor and recurse on both halves.
    factor: int = _pollard_rho_factor(value)
    split(factor)
    split(value // factor)

  split(number)
  return factorization

Worked example (rho splits $91$ ). Iterate $f (x) = x^{2} + 1 mod 91$ from $x_{0} = 2$ , running a slow pointer $x$ (one step) against a fast pointer $y$ (two steps) — Floyd's tortoise-and-hare cycle detector — and testing $gcd (∣ x - y ∣, 91)$ at each round. After one round the slow pointer is at $f (2) = 5$ and the fast pointer at $f (f (2)) = f (5) = 26$ , so we test $gcd (∣ 5 - 26 ∣, 91) = gcd (21, 91) = 7$ , a nontrivial factor, so $91 = 7 \cdot 13$ . The reason it works: modulo the hidden factor $7$ , the sequence $2, 5, 26 \equiv 5, \dots$ collides after only a few steps (there are just $7$ residues), while modulo $91$ it has not yet repeated — so $x \equiv y (mod 7)$ but $x \neq \equiv y (mod 91)$ , the gap that makes $gcd (x - y, 91)$ land on $7$ . The expected number of steps to a collision modulo a factor $p$ is $O (p)$ by the birthday bound, giving the $O (n^{1/4})$ expected running time.

Multiplicative functions from the factorization

Once $x = \prod_{i} p_{i}^{e_{i}}$ is in hand, a family of useful quantities are read straight off the exponents. Each is multiplicative, meaning its value on a product of coprimes is the product of its values, which is why each factors as a product over the distinct primes.

Number of divisors $τ (x)$ . A divisor of $x$ chooses, independently for each prime $p_{i}$ , an exponent between $0$ and $e_{i}$ — that is $e_{i} + 1$ choices. Multiplying the independent counts,

τ (x) = i \prod (e_{i} + 1) .

For $360 = 2^{3} \cdot 3^{2} \cdot 5^{1}$ this is $(3 + 1) (2 + 1) (1 + 1) = 24$ divisors. The product literally counts cells of a grid: the exponents of $2$ and $3$ index a $4 \times 3$ block of divisors of $2^{3} \cdot 3^{2}$ , and the choice of the factor $5^{0}$ or $5^{1}$ stacks a second identical block behind it, so $4 \cdot 3 \cdot 2 = 24$ .

τ (360) = (3 + 1) (2 + 1) (1 + 1)

counts cells of an exponent grid: a

4 \times 3

block of

2^{a} 3^{b}

, doubled by the factor

5^{0}

5^{1}

Four Divisors and Closest Divisors are direct applications: the former asks for numbers with $τ (x) = 4$ , the latter searches divisor pairs near $x$ .

Sum of divisors $σ (x)$ . The divisors of $x$ are obtained by expanding $\prod_{i} (1 + p_{i} + p_{i}^{2} + \dots + p_{i}^{e_{i}})$ ; each bracket is a geometric series, so

σ (x) = i \prod \frac{p _{i}^{e_{i} + 1} - 1}{p _{i} - 1} .

Euler's totient $φ (x)$ counts the integers in $[1, x]$ coprime to $x$ . By inclusion–exclusion over the distinct prime factors, removing the fraction $1/ p_{i}$ of integers each prime divides, it reduces to a product:

φ (x) = x i \prod (1 - \frac{1}{p _{i}}) .

For example $φ (360) = 360 (1 - \frac{1}{2}) (1 - \frac{1}{3}) (1 - \frac{1}{5}) = 96$ .

multiplicative_functions.pypython

from collections.abc import Mapping

from trial_division import trial_factor

def _factorization(value: int, factors: Mapping[int, int] | None) -> Mapping[int, int]:
  """
    Use the supplied factorization of `value`, or compute one by trial\n
    division when none is given.\n
  """
  if factors is not None:
    return factors
  return trial_factor(value)

def divisor_count(value: int, factors: Mapping[int, int] | None = None) -> int:
  """
    tau(value): the number of positive divisors, prod (exponent + 1).\n
    Pass a precomputed `factors` map to skip refactoring.\n
  """
  if value < 1:
    raise ValueError("value must be a positive integer")

  # tau = prod (e_i + 1) over the exponents of distinct primes.
  count: int = 1
  for exponent in _factorization(value, factors).values():
    count *= exponent + 1
  return count

def divisor_sum(value: int, factors: Mapping[int, int] | None = None) -> int:
  """
    sigma(value): the sum of all positive divisors, by the geometric-series\n
    product prod (p^(e+1) - 1)/(p - 1).\n
  """
  if value < 1:
    raise ValueError("value must be a positive integer")

  # multiply the geometric-series term (p^(e+1) - 1)/(p - 1) per prime.
  total: int = 1
  for prime, exponent in _factorization(value, factors).items():
    total *= (prime ** (exponent + 1) - 1) // (prime - 1)
  return total

def euler_totient(value: int, factors: Mapping[int, int] | None = None) -> int:
  """
    phi(value): the count of integers in `[1, value]` coprime to `value`,\n
    via value * prod (1 - 1/p) = value * prod (p - 1)/p over distinct primes.\n
  """
  if value < 1:
    raise ValueError("value must be a positive integer")

  # apply the (p - 1)/p factor per distinct prime in exact integer math.
  result: int = value
  for prime in _factorization(value, factors):
    result -= result // prime
  return result

When $φ$ is needed for every number up to $n$ , do not factor each one; sieve $φ$ directly. Initialize $φ [i] = i$ , then for each prime $p$ sweep its multiples and apply the factor $(1 - 1/ p)$ once, i.e. $φ [m] - = φ [m] / p$ for each multiple $m$ of $p$ :

Algorithm:

\textsc{TotientSieve}(n)

— compute

\varphi(x)

for all

x \le n

1
for $i \gets 0$ to $n$ do $\varphi[i] \gets i$
2
for $p \gets 2$ to $n$ do
3
if $\varphi[p] = p$ then
$p$ is prime
4
$m \gets p$
5
while $m \le n$ do
6
$\varphi[m] \gets \varphi[m] - \varphi[m] / p$
apply $(1-1/p)$
7
$m \gets m + p$
8
return $\varphi$

totient_sieve.pypython

def totient_sieve(limit: int) -> list[int]:
  """
    The array `phi[0..limit]` of Euler totients. `phi[0] = 0`, `phi[1] = 1`,\n
    and `phi[x]` counts the integers in `[1, x]` coprime to x for x >= 1.\n
  """
  if limit < 0:
    raise ValueError("limit must be non-negative")

  totient: list[int] = list(range(limit + 1))
  for candidate in range(2, limit + 1):

    # an untouched entry equal to itself marks a prime; sweep its multiples.
    if totient[candidate] == candidate:
      multiple: int = candidate
      while multiple <= limit:
        totient[multiple] -= totient[multiple] // candidate
        multiple += candidate
  return totient

This runs in $O (n log log n)$ , the same harmonic-over-primes sum as the plain sieve, and gives every totient at once.⁴

Segmented sieving and counting the primes

Two extensions matter at scale: sieving beyond available memory, and counting primes without enumerating them.

Segmented sieving. The plain sieve needs an array of size $n$ , which fails when $n$ is, say, $1 0^{12}$ — no machine holds a trillion-bit array. The segmented sieve fixes this: compute the primes up to $n$ once (a small sieve), then process $[2, n]$ in cache-sized windows $[ℓ, ℓ + Δ)$ , marking each window with the multiples of every prime $\leq n$ . Memory drops to $O (n + Δ)$ while the total work stays $O (n log log n)$ , and because each window fits in cache the constant factor improves. This is how record prime enumerations (all primes below $1 0^{18}$ ) are actually run. To count primes in a range $[ℓ, r]$ — the shape of Closest Prime Numbers in Range — sieve just that window against the small primes below $r$ .

Segmented sieve: small primes up to

n

(left) mark each cache-sized window of

[2, n]

in turn, so only

O (n + Δ)

memory is live.

How many primes are there? The sieve enumerates primes; the prime number theorem counts them: $π (n) \sim n / ln n$ , so a random integer near $n$ is prime with probability about $1/ ln n$ .⁵ This is what tells RSA key generation how many random candidates it must test before finding a 1024-bit prime (about $ln 2^{1024} \approx 710$ on average). Counting $π (n)$ without listing every prime — the Meissel–Lehmer method and its modern refinement by Lagarias, Miller, and Odlyzko — computes $π (n)$ in roughly $O (n^{2/3})$ time and far less space, reaching values of $n$ well beyond what any sieve could enumerate.⁶

The linear sieve's lineage. The once-per-composite linear sieve is usually credited to Paul Pritchard's sublinear wheel sieves and to Gries and Misra (1978), who gave the SPF-recording form used here.⁷ Its main value is the smallest-prime-factor table it produces, not the marginal speedup over Eratosthenes: the table turns every subsequent factorization query into an $O (log x)$ table walk.

Takeaways

The sieve of Eratosthenes marks composites by striking each prime's multiples from $p^{2}$ with stride $p$ ; survivors are prime. The cost is $\sum_{p \leq n} n / p = O (n log log n)$ by Mertens' theorem, space $O (n)$ .
The linear sieve strikes each composite exactly once — by its smallest prime factor — running in $Θ (n)$ while recording $spf [x]$ ; the break when $p ∣ i$ is what enforces the once-only invariant.
With an SPF table, any $x \leq n$ factors in $O (log x)$ by repeatedly dividing by $spf [x]$ ; without preprocessing, trial division to $x$ costs $O (x)$ , and Pollard's rho + Miller–Rabin handle large $x$ .
From $x = \prod p_{i}^{e_{i}}$ the multiplicative functions follow: $τ (x) = \prod (e_{i} + 1)$ , $σ (x) = \prod \frac{p _{i}^{e_{i} + 1} - 1}{p _{i} - 1}$ , and Euler's totient $φ (x) = x \prod (1 - 1/ p_{i})$ .
$φ$ over an entire range is itself sieved in $O (n log log n)$ ; never factor each number when a sieve will compute them all together.

CLRS, Ch. 31 — Number-Theoretic Algorithms: divisibility, primes, and the cost of generating them; the $O (n log log n)$ sieve bound follows from $\sum_{p \leq n} 1/ p = ln ln n + O (1)$ . ↩
Skiena, § — Number Theory / Primes: the sieve of Eratosthenes and its linear refinement that records smallest prime factors for $O (log x)$ factorization. ↩
CLRS, Ch. 31 — Number-Theoretic Algorithms (§31.9): Pollard's rho heuristic for factoring large integers, with Miller–Rabin (§31.8) as the companion primality test. ↩
Erickson, Ch. — (number theory): multiplicative functions $τ$ , $σ$ , $φ$ read off the prime factorization, and sieving $φ$ over a range. ↩
The prime number theorem, $π (n) \sim n / ln n$ (Hadamard and de la Vallée Poussin, 1896). See Skiena, § — Number Theory / Primes, for the algorithmic consequence for random-prime generation. ↩
J. C. Lagarias, V. S. Miller, A. M. Odlyzko, Computing $π (x)$ : the Meissel–Lehmer method, Mathematics of Computation 44(170), 1985. ↩
D. Gries and J. Misra, A linear sieve algorithm for finding prime numbers, Communications of the ACM 21(12), 1978; P. Pritchard, A sublinear additive sieve for finding prime numbers, CACM 24(1), 1981. ↩

The sieve of Eratosthenes

Why it is O(nloglogn)

The linear sieve and smallest prime factors

Factorization

With a precomputed SPF table: O(logx)

Without preprocessing: trial division and beyond

Multiplicative functions from the factorization

Segmented sieving and counting the primes

Takeaways

Footnotes

Why it is $O (n log log n)$

With a precomputed SPF table: $O (log x)$