Mathematical Algorithms/Sieves & Factorization

Lesson 10.31,595 words

Sieves & Factorization

The previous lesson tested one number for primality; here we ask for all primes up to nn at once. The sieve of Eratosthenes cross-cuts composites in O(nloglogn)O(n\log\log n), and a linear sieve does it in O(n)O(n) while recording each number's smallest prime factor, which then factors any xnx \le n in O(logx)O(\log x).

╌╌╌╌

The previous lesson handed us a fast test for whether a single number is prime. Many problems instead need the primes en masse: every prime below , or the factorization of each of many queries, and testing each number independently wastes the structure shared across them. A sieve turns the question inside out. Rather than interrogate one number at a time, it lets each prime announce its own multiples, so the composites are eliminated collectively. The result is a precomputed table over that answers is prime? in and, with one more field, factors any in .

The sieve of Eratosthenes

The idea is older than computers and disarmingly simple. Write the integers . The smallest unmarked number, , is prime; cross out all of its multiples . The next still-unmarked number, , is prime; cross out . Repeat. Whenever we reach an unmarked number it has survived every smaller prime, so it has no smaller divisor, hence it is prime, and we strike its multiples in turn. When we are done, the unmarked numbers are exactly the primes.

In the grid below, is laid out ten per row: composites are shaded out (grey), is left blank, and the survivors — the primes — are highlighted.

The sieve over : composites shaded out, blank, the 25 surviving primes highlighted.

Two optimizations make the sieve fast and are worth stating precisely.

Algorithm:Sieve(n)\textsc{Sieve}(n) — mark composites, return the prime indicator array
  1. 1
    P[0..n]trueP[0..n] \gets \text{true}; P[0]P[1]falseP[0] \gets P[1] \gets \text{false}
  2. 2
    for p2p \gets 2 to n\lfloor\sqrt{n}\rfloor do
  3. 3
    if P[p]P[p] then
  4. 4
    ippi \gets p \cdot p
    earlier multiples already struck
  5. 5
    while ini \le n do
  6. 6
    P[i]falseP[i] \gets \text{false}
  7. 7
    ii+pi \gets i + p
  8. 8
    return PP

Why it is

The work is dominated by the inner loop, which for each prime strikes multiples. Summing over primes,

The naive worry is that behaves like the harmonic series , which would give . But the sum runs over primes only, which are sparse, and a classical theorem of Mertens says the reciprocal sum of primes grows far more slowly:

Hence the total work is .1 The factor is, for all practical , a small constant (under for ), so the sieve is effectively linear. Space is for the array (one bit per number if packed). Starting at rather than does not change the asymptotics but roughly halves the constant.

The linear sieve and smallest prime factors

The Eratosthenes sieve strikes some composites more than once: is hit by (as ) and by (as ). The redundancy is exactly the factor. A linear sieve removes it by guaranteeing that every composite is crossed out exactly once, by its smallest prime factor (SPF). As a bonus it records that smallest prime factor, which is the key to fast factorization below.

Maintain a growing list of primes found so far. For each from to , and for each known prime in increasing order, mark the product as composite with smallest prime factor . The subtle line is the termination: as soon as divides , we break.

Algorithm:LinearSieve(n)\textsc{LinearSieve}(n) — compute spf[x]\text{spf}[x] for every xnx \le n
  1. 1
    spf[0..n]0\text{spf}[0..n] \gets 0; primes[]\text{primes} \gets [\,]
  2. 2
    for i2i \gets 2 to nn do
  3. 3
    if spf[i]=0\text{spf}[i] = 0 then
    ii is prime
  4. 4
    spf[i]i\text{spf}[i] \gets i; append ii to primes\text{primes}
  5. 5
    for each pp in primes\text{primes} do
  6. 6
    if p>spf[i]p > \text{spf}[i] or ip>ni \cdot p > n then break
  7. 7
    spf[ip]p\text{spf}[i \cdot p] \gets p
  8. 8
    return spf,primes\text{spf}, \text{primes}

Each composite is written once, when and , so the total number of marking operations is exactly the number of composites: the running time is , with space.2 The classic Count Primes problem is solved by either sieve; the linear sieve is the right tool whenever you also need per-number factor data downstream.

The payoff is the table itself: every prime maps to itself, every composite to its smallest prime factor, each entry written exactly once. The composite , for instance, is struck only when , never again by the larger prime .

Linear sieve: each composite carries its smallest prime factor , written once

Factorization

With a precomputed SPF table:

Given the array from the linear sieve, any factors by peeling off its smallest prime factor and dividing it out, repeatedly, until remains.

Algorithm:Factor(x)\textsc{Factor}(x) — full prime factorization of xnx \le n via spf\text{spf}
  1. 1
    F{}F \gets \{\}
    map prime \to exponent
  2. 2
    while x>1x > 1 do
  3. 3
    pspf[x]p \gets \text{spf}[x]
  4. 4
    while xmodp=0x \bmod p = 0 do
  5. 5
    xx/px \gets x / p; F[p]F[p]+1F[p] \gets F[p] + 1
  6. 6
    return FF

Each division by a prime at least halves , so the outer process runs at most times: factorization is once the table is built. This is what makes problems like Distinct Prime Factors of Product of Array and Smallest Value After Replacing With Sum of Prime Factors tractable across many values: sieve once, then factor each query in logarithmic time.

Divide by until ; the chain of factors collects in the accent box

Without preprocessing: trial division and beyond

When is a one-off, or larger than any sieve we can afford, fall back to trial division: try each candidate divisor up to , dividing it out whenever it divides. The bound is the same observation as in the primality lesson: if with then , so the smallest nontrivial factor appears by ; any factor left after the loop is the final large prime.

Algorithm:TrialFactor(x)\textsc{TrialFactor}(x) — factor a single xx in O(x)O(\sqrt{x})
  1. 1
    F{}F \gets \{\}; d2d \gets 2
  2. 2
    while ddxd \cdot d \le x do
  3. 3
    while xmodd=0x \bmod d = 0 do
  4. 4
    xx/dx \gets x / d; F[d]F[d]+1F[d] \gets F[d] + 1
  5. 5
    dd+1d \gets d + 1
  6. 6
    if x>1x > 1 then F[x]F[x]+1F[x] \gets F[x] + 1
    leftover prime >x> \sqrt{x}
  7. 7
    return FF

This costs . For genuinely large (say -bit and beyond), is too slow, and one reaches for Pollard's rho, a randomized factoring algorithm that finds a nontrivial factor in expected time via cycle-detection on a pseudorandom map, paired with the Miller–Rabin primality test to know when a factor is itself prime and recursion can stop.3

The name comes from the shape of the orbit. Iterating from a seed eventually repeats, so the trajectory runs down a tail and then loops a cycle — drawn out, it looks like the Greek letter . On the seed feeds a four-step cycle; because the cycle's residues modulo collide before they collide modulo , a difference shares the factor with , which then extracts.

Pollard's rho on with : the orbit of forms a tail into a cycle — the shape

Multiplicative functions from the factorization

Once is in hand, a family of useful quantities are read straight off the exponents. Each is multiplicative, meaning its value on a product of coprimes is the product of its values, which is why each factors as a product over the distinct primes.

Number of divisors . A divisor of chooses, independently for each prime , an exponent between and — that is choices. Multiplying the independent counts,

For this is divisors. The product literally counts cells of a grid: the exponents of and index a block of divisors of , and the choice of the factor or stacks a second identical block behind it, so .

counts cells of an exponent grid: a block of , doubled by the factor or .

Four Divisors and Closest Divisors are direct applications: the former asks for numbers with , the latter searches divisor pairs near .

Sum of divisors . The divisors of are obtained by expanding ; each bracket is a geometric series, so

Euler's totient counts the integers in coprime to . By inclusion–exclusion over the distinct prime factors, removing the fraction that each prime kills, it collapses to a clean product:

For example .

When is needed for every number up to , do not factor each one; sieve directly. Initialize , then for each prime sweep its multiples and apply the factor once, i.e. for each multiple of :

Algorithm:TotientSieve(n)\textsc{TotientSieve}(n) — compute φ(x)\varphi(x) for all xnx \le n
  1. 1
    for i0i \gets 0 to nn do φ[i]i\varphi[i] \gets i
  2. 2
    for p2p \gets 2 to nn do
  3. 3
    if φ[p]=p\varphi[p] = p then
    pp is prime
  4. 4
    mpm \gets p
  5. 5
    while mnm \le n do
  6. 6
    φ[m]φ[m]φ[m]/p\varphi[m] \gets \varphi[m] - \varphi[m] / p
    apply (11/p)(1-1/p)
  7. 7
    mm+pm \gets m + p
  8. 8
    return φ\varphi

This runs in , the same harmonic-over-primes sum as the plain sieve, and gives every totient at once.4

Takeaways

  • The sieve of Eratosthenes marks composites by striking each prime's multiples from with stride ; survivors are prime. The cost is by Mertens' theorem, space .
  • The linear sieve strikes each composite exactly once — by its smallest prime factor — running in while recording ; the break when is what enforces the once-only invariant.
  • With an SPF table, any factors in by repeatedly dividing by ; without preprocessing, trial division to costs , and Pollard's rho + Miller–Rabin handle large .
  • From the multiplicative functions follow: , , and Euler's totient .
  • over an entire range is itself sieved in ; never factor each number when a sieve will compute them all together.

Footnotes

  1. CLRS, Ch. 31 — Number-Theoretic Algorithms: divisibility, primes, and the cost of generating them; the sieve bound follows from .
  2. Skiena, § — Number Theory / Primes: the sieve of Eratosthenes and its linear refinement that records smallest prime factors for factorization.
  3. CLRS, Ch. 31 — Number-Theoretic Algorithms (§31.9): Pollard's rho heuristic for factoring large integers, with Miller–Rabin (§31.8) as the companion primality test.
  4. Erickson, Ch. — (number theory): multiplicative functions , , read off the prime factorization, and sieving over a range.
Practice

╌╌ END ╌╌