Sieves & Factorization
The previous lesson tested one number for primality; here we ask for all primes up to at once. The sieve of Eratosthenes cross-cuts composites in , and a linear sieve does it in while recording each number's smallest prime factor, which then factors any in .
╌╌╌╌
The previous lesson handed us a fast test for whether a single number is prime.
Many problems instead need the primes en masse: every prime below , or the
factorization of each of many queries, and testing each number independently
wastes the structure shared across them. A sieve turns the question inside
out. Rather than interrogate one number at a time, it lets each prime announce
its own multiples, so the composites are eliminated collectively. The result is
a precomputed table over that answers is prime?
in and, with
one more field, factors any in .
The sieve of Eratosthenes
The idea is older than computers and disarmingly simple. Write the integers . The smallest unmarked number, , is prime; cross out all of its multiples . The next still-unmarked number, , is prime; cross out . Repeat. Whenever we reach an unmarked number it has survived every smaller prime, so it has no smaller divisor, hence it is prime, and we strike its multiples in turn. When we are done, the unmarked numbers are exactly the primes.
In the grid below, is laid out ten per row: composites are shaded out (grey), is left blank, and the survivors — the primes — are highlighted.
Two optimizations make the sieve fast and are worth stating precisely.
- 1;
- 2for to do
- 3if then
- 4earlier multiples already struck
- 5while do
- 6
- 7
- 8return
Why it is
The work is dominated by the inner loop, which for each prime strikes multiples. Summing over primes,
The naive worry is that behaves like the harmonic series , which would give . But the sum runs over primes only, which are sparse, and a classical theorem of Mertens says the reciprocal sum of primes grows far more slowly:
Hence the total work is .1 The factor is, for all practical , a small constant (under for ), so the sieve is effectively linear. Space is for the array (one bit per number if packed). Starting at rather than does not change the asymptotics but roughly halves the constant.
The linear sieve and smallest prime factors
The Eratosthenes sieve strikes some composites more than once: is hit by (as ) and by (as ). The redundancy is exactly the factor. A linear sieve removes it by guaranteeing that every composite is crossed out exactly once, by its smallest prime factor (SPF). As a bonus it records that smallest prime factor, which is the key to fast factorization below.
Maintain a growing list of primes found so far. For each from to , and for each known prime in increasing order, mark the product as composite with smallest prime factor . The subtle line is the termination: as soon as divides , we break.
- 1;
- 2for to do
- 3if thenis prime
- 4; append to
- 5for each in do
- 6if or then break
- 7
- 8return
Each composite is written once, when and
, so the total number of marking operations is exactly the
number of composites: the running time is , with space.2
The classic Count Primes problem is solved by either sieve; the linear sieve is
the right tool whenever you also need per-number factor data downstream.
The payoff is the table itself: every prime maps to itself, every composite to its smallest prime factor, each entry written exactly once. The composite , for instance, is struck only when , never again by the larger prime .
Factorization
With a precomputed SPF table:
Given the array from the linear sieve, any factors by peeling off its smallest prime factor and dividing it out, repeatedly, until remains.
- 1map prime exponent
- 2while do
- 3
- 4while do
- 5;
- 6return
Each division by a prime at least halves , so the outer process runs
at most times: factorization is once the table is built.
This is what makes problems like Distinct Prime Factors of Product of Array and
Smallest Value After Replacing With Sum of Prime Factors tractable across many
values: sieve once, then factor each query in logarithmic time.
Without preprocessing: trial division and beyond
When is a one-off, or larger than any sieve we can afford, fall back to trial division: try each candidate divisor up to , dividing it out whenever it divides. The bound is the same observation as in the primality lesson: if with then , so the smallest nontrivial factor appears by ; any factor left after the loop is the final large prime.
- 1;
- 2while do
- 3while do
- 4;
- 5
- 6if thenleftover prime
- 7return
This costs . For genuinely large (say -bit and beyond), is too slow, and one reaches for Pollard's rho, a randomized factoring algorithm that finds a nontrivial factor in expected time via cycle-detection on a pseudorandom map, paired with the Miller–Rabin primality test to know when a factor is itself prime and recursion can stop.3
The name comes from the shape of the orbit. Iterating from a seed eventually repeats, so the trajectory runs down a tail and then loops a cycle — drawn out, it looks like the Greek letter . On the seed feeds a four-step cycle; because the cycle's residues modulo collide before they collide modulo , a difference shares the factor with , which then extracts.
Multiplicative functions from the factorization
Once is in hand, a family of useful quantities are read straight off the exponents. Each is multiplicative, meaning its value on a product of coprimes is the product of its values, which is why each factors as a product over the distinct primes.
Number of divisors . A divisor of chooses, independently for each prime , an exponent between and — that is choices. Multiplying the independent counts,
For this is divisors. The product literally counts cells of a grid: the exponents of and index a block of divisors of , and the choice of the factor or stacks a second identical block behind it, so .
Four Divisors and Closest Divisors are direct applications: the former asks
for numbers with , the latter searches divisor pairs near .
Sum of divisors . The divisors of are obtained by expanding ; each bracket is a geometric series, so
Euler's totient counts the integers in coprime to . By inclusion–exclusion over the distinct prime factors, removing the fraction that each prime kills, it collapses to a clean product:
For example .
When is needed for every number up to , do not factor each one; sieve directly. Initialize , then for each prime sweep its multiples and apply the factor once, i.e. for each multiple of :
- 1for to do
- 2for to do
- 3if thenis prime
- 4
- 5while do
- 6apply
- 7
- 8return
This runs in , the same harmonic-over-primes sum as the plain sieve, and gives every totient at once.4
Takeaways
- The sieve of Eratosthenes marks composites by striking each prime's multiples from with stride ; survivors are prime. The cost is by Mertens' theorem, space .
- The linear sieve strikes each composite exactly once — by its smallest
prime factor — running in while recording ; the
breakwhen is what enforces the once-only invariant. - With an SPF table, any factors in by repeatedly dividing by ; without preprocessing, trial division to costs , and Pollard's rho + Miller–Rabin handle large .
- From the multiplicative functions follow: , , and Euler's totient .
- over an entire range is itself sieved in ; never factor each number when a sieve will compute them all together.
Footnotes
- CLRS, Ch. 31 — Number-Theoretic Algorithms: divisibility, primes, and the cost of generating them; the sieve bound follows from . ↩
- Skiena, § — Number Theory / Primes: the sieve of Eratosthenes and its linear refinement that records smallest prime factors for factorization. ↩
- CLRS, Ch. 31 — Number-Theoretic Algorithms (§31.9): Pollard's rho heuristic for factoring large integers, with Miller–Rabin (§31.8) as the companion primality test. ↩
- Erickson, Ch. — (number theory): multiplicative functions , , read off the prime factorization, and sieving over a range. ↩
╌╌ END ╌╌