Foundations/Asymptotic Analysis

Lesson 1.22,107 words

Asymptotic Analysis

We measure an algorithm's running time as a function of its input size, then strip away machine-specific constants and lower-order terms to compare algorithms cleanly. This lesson defines the RAM model and the OO, Ω\Omega, Θ\Theta, oo, and ω\omega notations, and shows how to read the cost of loops off the page.

╌╌╌╌

In the previous lesson we saw the same algorithm, insertion sort, cost quadratically many comparisons on one input and linearly many on another, all on the same machine. To compare algorithms as algorithms, independent of the hardware and the particular input, we need two things: a model of computation abstract enough to ignore the machine, and a notation coarse enough to ignore constants. This lesson supplies both.

The RAM model of computation

We analyze algorithms against an idealized machine, the random-access machine (RAM). It has the properties all three books assume, usually tacitly:

  • Instructions execute one at a time, no concurrency.
  • The basic operations, namely arithmetic (, , , ), comparisons, data movement (load, store, copy), and control flow, each take a constant amount of time.
  • Memory is an unbounded array of cells, and accessing any cell by its index costs the same constant (this is what random access means).
  • Each cell holds an integer or float of reasonable size, roughly bits for an input of size , so a single value fits in a machine word.

The RAM is a deliberate fiction. Real multiplication is not truly constant-time for arbitrarily large numbers; real memory has caches that make some accesses far cheaper than others. But the model is predictive: an algorithm that is fast on the RAM is, overwhelmingly, fast in practice. Skiena stresses this engineering payoff;1 CLRS is careful to flag the places, such as bignum arithmetic, where the constant-word assumption breaks down.2

From problem to

It helps to be precise about what we are even measuring. A computational problem is just a function from a set of possible inputs to a set of possible outputs; each element is an instance of . Sorting integer arrays, for example, has . A size function records how big each instance is. For an array we take . An algorithm solves if for every instance .

Now charge = the number of elementary RAM operations performs on input . Inputs of the same size can cost different amounts, so we take the worst one of each size:

When the algorithm is understood, this is exactly the function we denote , the running time as a function of the input size . The size is usually the number of elements, but sometimes the number of bits, or two parameters (e.g. and for a graph); choosing the right size measure is the first decision in any analysis. What we ultimately want is a good, convenient-to-understand upper bound on , which is precisely what the notation below provides.

Worst, average, and best case

For a fixed input size , different inputs of that size may cost different amounts. Insertion sort costs on a sorted array and on a reversed one. So is not one number; it is a range. We summarize it three ways:

  • Worst case : the maximum cost over all inputs of size .
  • Best case: the minimum cost over all inputs of size .
  • Average case: the expected cost over a probability distribution on inputs of size (usually the uniform distribution).
For each fixed size , the cost is a range: best at the bottom edge, worst at the top, average in between. The shaded band is the spread over all inputs of that size. Slicing at one chosen size pins the three cases to three points on the vertical — the runtime of any single input of size lands somewhere on that segment.

We almost always report the worst case. It is a guarantee: the algorithm never does worse, no matter how adversarial the input. The best case is nearly useless as a promise, since any algorithm looks good on its luckiest input. The average case is the most honest predictor of typical performance but requires us to commit to a distribution, and the analysis is usually harder (it often needs the probabilistic tools of a later module). CLRS develops all three; Skiena argues that for design purposes the worst case is what you should plan for.

Why we drop constants and lower-order terms

Suppose careful counting gives the running time of some algorithm as

Two facts make most of this expression noise:

  1. The leading term dominates. As grows, swamps . At the quadratic term is and the rest is , under . The growth rate is governed entirely by .
  2. The constants are machine artifacts. The depends on how many RAM operations our particular pseudocode spends per iteration; recompile on a different machine and it changes. It says nothing about the algorithm's intrinsic scaling.

So we throw both away and say the running time is order . This is not sloppiness; it is the right level of abstraction. An algorithm with a tiny constant still loses eventually to an algorithm with a large one, and eventually is exactly what asymptotic analysis captures. The notation below makes order precise.

The asymptotic notations

Let and be functions from the positive integers to the nonnegative reals. The notations describe how behaves relative to for all sufficiently large .

Big-O: asymptotic upper bound

State it as a clean existential:

The constant lets us ignore multiplicative factors; the threshold lets us ignore small inputs where lower-order terms might still dominate. (CLRS phrases as a set of functions and adds to keep things nonnegative; the two readings agree.)

The picture to keep in mind is the for large sketch: the scaled curve rises above once passes the threshold , and stays above forever after. What happens to the left of is irrelevant.

Curve rises above once passes the threshold .

The wanted inequality method. Proofs of -bounds follow a fixed recipe: write down the inequality you want to hold, then reverse-engineer constants and that make it true. To prove , we want . For each lower term is at most , so ; thus , work. The same move handles any polynomial; see the theorem below.

Big-Omega: asymptotic lower bound

Mirror the quantifiers, flip the inequality:

It is the mirror image of : if and only if .

Big-Theta: asymptotic tight bound

pins between two constant multiples of : it grows exactly as fast as , up to constants. The fundamental link is

When we say insertion sort is in the worst case, we mean its worst-case cost is sandwiched between and , a precise, two-sided claim. When we only have an upper bound we say ; this is why people loosely write even where holds. But the distinction matters, as the next two results make concrete.

The polynomial theorem

The single most useful fact for everyday analysis collapses every polynomial to its leading power:

Upper bound (). Replace every coefficient by its absolute value and every lower power by (valid for ):

So define and pick .

Lower bound (). Pull out the leading term and bound the rest below; for ,

The wanted inequality method now finds an past which the negative tail is, say, at most half of , leaving for all . Together the two directions give : drop the constants and lower-order terms, keep the leading power.

is an upper bound, not a promise of tightness

Consider a worked cautionary case. Exchange sort compares with for every pair , so its running time satisfies

by the polynomial theorem. But also implies the true but useless statement : a correct upper bound need not be tight. So how loose can we go? Not below . The number of pairs is itself a lower bound on the work:

This forces ; indeed for any . No constant can keep under once is large enough, because whenever . The moral: alone tells you the cost is no worse than something; only matching it with (i.e. proving ) certifies you have found the true growth rate.

Picture the exponents on a line. Exchange sort's cost sits at . Every with is a valid upper bound, but only is tight; the side forbids any upper bound with , walling off the left. Where the two bounds meet is .

Valid bounds for exchange sort's cost, by exponent. Every with holds but only is tight; rules out the whole region below . The bounds pinch shut exactly at .

Little-o and little-omega: strict bounds

and allow and to grow at the same rate. The lowercase versions forbid that; they assert a strict gap.

So means becomes negligible compared to : e.g. and . Symmetrically, is defined by the reciprocal limit, so dominates . A useful analogy from CLRS: are to functions as are to numbers.

Each asymptotic notation mirrors a comparison on numbers: line up with . The little-o and little-omega ends are strict (they forbid equal growth), exactly as and are strict.

Comparing functions with limits

The limit of the ratio is how you rank growth rates:

To compare, say, and for any constants , one shows the ratio , so every polynomial beats every polylogarithm.3 Likewise for every constant : exponentials dominate all polynomials. A useful fact for sums: (by Stirling's approximation),4 which is why comparison sorts that do work are optimal in a sense we will prove later.

Why we write with no base. Inside asymptotic notation the base is irrelevant, because changing base only multiplies by a constant:

So , , and all live in the same -class, and means the same thing whichever base you had in mind. (The constant is exactly the multiplicative factor and are designed to absorb.)

The growth hierarchy

The functions algorithm designers meet most often, in strictly increasing order of growth, are:

ClassNameA typical algorithm
constantarray index, hash lookup (expected)
logarithmicbinary search
linearscan / find max
linearithmicmerge sort, heapsort
quadraticinsertion sort (worst case)
cubicnaive matrix multiply
exponentialsubset enumeration
factorialbrute-force permutations (TSP)

Each row's growth dwarfs every row above it for large . The practical dividing line Skiena draws is between polynomial time ( for a constant ), generally considered tractable, and exponential time, which becomes hopeless very fast.5 A factorial-time algorithm that handles in a second needs years at .

A sketch of the curves makes the separation clear. Even with a generous constant on the slower-growing function, the faster one wins past some crossover :

Growth-rate curves for , , , and .

Reading the cost off loops

Most analysis reduces to counting how many times each line runs. A few rules cover the common cases.

Sequential blocks add; we keep the max. If block costs and is followed by block costing , the total is : the larger term absorbs the smaller.

A simple loop multiplies the body by the iteration count. This nested fragment runs the constant-time body times:

Algorithm:Counting work in nested loops
  1. 1
    for i1i \gets 1 to nn do
  2. 2
    for j1j \gets 1 to nn do
  3. 3
    cc+A[i]B[j]c \gets c + A[i] \cdot B[j]
    Θ(1)\Theta(1) body, n2n^2 times

so its cost is .

When the inner bound depends on the outer index, sum a series. If the inner loop runs , the body executes times — still quadratic, because triangular work is half of square work, and the constant vanishes. This is exactly the shape of insertion sort's worst case.

When the loop variable is scaled, take a logarithm. A loop that does until runs about times, since doubles each pass. This is the source of every logarithm in algorithm analysis: repeatedly halving (or doubling) gives steps. Binary search and balanced-tree depth are the canonical examples.

The three loop shapes read off as area (or, for doubling, as the number of rungs). A square grid of iterations is work; bounding the inner loop by the outer index fills only the triangle below the diagonal, half as much but still ; and a doubling index touches just the powers of two.

Reading loop cost as area. A square grid of iterations is work; an inner bound fills only the triangle, ; a doubling index visits just rungs.

Two more loop shapes round out the catalog, and both read off as areas just like the three above. An outer loop of passes wrapped around a doubling inner loop does work per pass — an grid, . And a loop whose live problem halves every pass while doing linear work on what remains sums a geometric series , so the whole stack of shrinking bars is no taller than two of the first — .

Two more loop costs as area. A linear loop wrapping a doubling inner loop fills an grid (); a loop that halves its live problem each pass stacks bars that sum to ().

The recurrences that arise when a loop is replaced by recursion, a function calling smaller copies of itself, need their own machinery, which is the subject of the next lesson.

Takeaways

  • The RAM model charges constant time per primitive operation and lets us measure running time as a function of input size, machine-independently.
  • Report the worst case by default: it is a guarantee. Best case promises nothing; average case is honest but needs a distribution.
  • Drop constants and lower-order terms. They are machine artifacts and noise; the leading term's growth rate is what scales.
  • is an upper bound, a lower bound, a tight (two-sided) bound; and are their strict versions. and .
  • The limit ranks two functions: , a constant, or give , , or .
  • Memorize the hierarchy ; the polynomial/exponential boundary is the line between tractable and hopeless.

Footnotes

  1. Skiena, §2 — Algorithm Analysis: the RAM model as a machine-independent abstraction whose engineering payoff is predicting real-world speed.
  2. CLRS, Ch. 3 — Characterizing Running Times: the RAM model's constant-word assumption and where it breaks down (e.g. bignum arithmetic).
  3. Erickson, Algorithms, Appendix — Solving Recurrences (analysis throughout): ranking growth rates by the limit of the ratio, so every polynomial dominates every polylogarithm.
  4. CLRS, Ch. 3 — Characterizing Running Times: via Stirling's approximation.
  5. Skiena, §2 — Algorithm Analysis: the polynomial-vs-exponential dividing line between tractable and hopeless running times.
Practice

╌╌ END ╌╌