Data Structures/Elementary Data Structures

Lesson 4.11,565 words

Elementary Data Structures

Every container is built one of two ways: contiguous in an array, or linked through pointers. We trade cache-friendly random access against O(1)O(1) splicing, derive the **amortized O(1)O(1) append of a doubling dynamic array, and assemble the two ordered access disciplines — the LIFO stack and the FIFO queue (with its generalization, the deque**) — on top of both.

╌╌╌╌

Before we can balance trees or hash keys, we need the raw material they are made of. Every data structure in this course (every tree, heap, hash table, and graph) is ultimately stored one of two ways, and the choice echoes through everything built on top of it. You can lay the elements out contiguously, packed shoulder-to-shoulder in one block of memory, or you can scatter them and stitch them together with pointers. This first lesson works out the two strategies and the four ordered containers (array, linked list, stack, queue) that the rest of the module assumes you already own.

Two ways to store a sequence

A contiguous structure stores its elements in a single block of memory, one after another. An array of elements each of size occupies one run of bytes, so element lives at a known offset from the start. A linked structure stores each element in its own separately-allocated node and uses a pointer in each node to find the next; the nodes may sit anywhere in memory.1

The contrast is sharp, and it drives every later choice:

  • Random access. Contiguous wins outright. Element of an array is at address , computed with one multiply-add, so indexing is . In a linked list there is no address arithmetic; to reach the -th node you must follow pointers, which is .
  • Splicing. Linked wins outright. To insert or delete an element given a pointer to its node, a linked list rewires a constant number of pointers in ; an array must shift every later element to keep the block contiguous, which is .
  • Cache locality. Contiguous wins. A modern CPU reads memory in cache lines and prefetches sequentially, so a linear scan of an array is far faster than chasing pointers across scattered nodes, even though both are comparisons. Constant factors, not asymptotics, yet they are large.
  • Space overhead. Linked pays per-element: every node carries one or two pointers besides its key. An array pays nothing per element but may reserve unused capacity (below).
The same sequence stored two ways. Contiguous: one block, element at , so indexing is address arithmetic. Linked: three separately-allocated nodes scattered in memory, each pointing to the next, so reaching element means following pointers.

Arrays and dynamic arrays

A fixed-size array is the contiguous structure in its purest form: allocate slots up front, index any of them in . Its limitation is exactly that is fixed at allocation. Real programs rarely know the final size in advance, so we want a structure that grows.

A dynamic array (C++ vector, Python list, Java ArrayList) keeps a contiguous backing block of some capacity the current size, and appends into the spare room. When the block fills, it allocates a larger block, copies the elements over, and frees the old one. The design decision that matters is how much larger, and the answer is to double the capacity.

Algorithm:Append(A,x)\textsc{Append}(A, x) — push xx, doubling the backing block on overflow
  1. 1
    if size(A)=capacity(A)size(A) = capacity(A) then
  2. 2
    capmax(1,2capacity(A))cap' \gets \max(1, 2 \cdot capacity(A))
  3. 3
    allocate new block BB of capacity capcap'
  4. 4
    copy A[0..size(A)1]A[0\,..\,size(A)-1] into BB
    the O(n)O(n) resize
  5. 5
    free old block; store(A)Bstore(A) \gets B; capacity(A)capcapacity(A) \gets cap'
  6. 6
    A[size(A)]xA[size(A)] \gets x
  7. 7
    size(A)size(A)+1size(A) \gets size(A) + 1

A single append is usually , writing into spare room and bumping the size, but the appends that trigger a resize cost because they copy the whole array. The worst case of one operation is therefore . Yet the average cost over a sequence of appends is , and this is worth proving.

A resize on overflow. The full block of capacity cannot take a fifth element, so a new block of capacity is allocated, the four elements are copied over (the step, red), is written into the first spare slot, and the old block is freed. The four trailing slots are reserved-but-unused capacity.

The geometric growth is what makes this work: each resize is twice as expensive as the last but happens half as often, so the costs telescope into a constant per operation. Growing by a fixed increment instead of doubling would make the same appends cost . The price of amortized is twofold: an occasional latency spike on the doubling step, and up to wasted capacity right after a resize.2

Doubling growth over appends: the capacity staircase (accent) jumps , while the size (filled) rises by one each append. Each jump is a copy, and the gap above the fill is the reserved-but-unused capacity.

Linked lists

A linked list threads elements through pointers. In a singly linked list each node stores a and a pointer to its successor; a pointer names the first node and the last node's is . A doubly linked list adds a pointer, so the list can be traversed in both directions and a node can be removed knowing only itself.

A doubly linked list; deleting the middle node is pointer splicing

The complexities follow directly from the pointer structure:

  • Insert / delete given the node. . To delete node from a doubly linked list, set and , a constant number of pointer writes, no shifting. This is the linked list's signature advantage over an array.
  • Search by key, or index by position. . There is no address arithmetic; you must walk the chain.

The boundary cases (deleting the head, deleting the tail, operating on an empty list) force nil checks that clutter the code. A standard trick removes them: a sentinel is a dummy node that is always present and never holds real data. Wrap the list into a ring around one sentinel , with the first real node and the last; now every node has a real predecessor and successor, and delete needs no special cases.3

Algorithm:List-Delete(x)\textsc{List-Delete}(x) — remove xx from a doubly linked list (sentinel form)
  1. 1
    next(prev(x))next(x)next(prev(x)) \gets next(x)
  2. 2
    prev(next(x))prev(x)prev(next(x)) \gets prev(x)

With a sentinel there are no nil guards: even at the ends, and point at real nodes (possibly the sentinel itself), so the two assignments always make sense.

operationarraylinked list
index / random access
search (unsorted)
insert/delete at known position shift splice
insert/delete at end amortized
cache localityexcellentpoor
extra space per elementnone1–2 pointers

Neither structure dominates: choose contiguous when you index and scan, linked when you splice in the middle and never need the -th element by number.

Stacks: last in, first out

A stack restricts access to one discipline: LIFO, last in, first out. Only the most recently inserted element is reachable. The operations are (add to the top), (remove the top), and (read the top without removing) — all .

A stack is trivial to back with a dynamic array: keep a index, push by writing and incrementing, pop by decrementing. (Amortized if it must grow.) Equally, a singly linked list with push/pop at the head is a stack with worst-case operations and no resize spikes.

Stacks appear wherever computation is nested: the call stack that holds function activation records, depth-first search, evaluating arithmetic expressions, and matching brackets. For brackets, push each opener, then pop and check on each closer, accepting iff the stack ends empty. That last pattern is the Valid Parentheses problem, and it is a stack in one line of intuition.

Queues and deques: first in, first out

A queue enforces the opposite discipline: FIFO, first in, first out, like a line at a counter. adds at the tail; removes from the head. Both are .

A stack is LIFO (one end, the top); a queue is FIFO (insert at tail, remove at head)

Backing a queue with an array needs care: if we always dequeued from index we would shift the whole array each time, . The fix is a circular buffer. Keep a fixed array of capacity and two indices, and ; enqueue writes and advances , dequeue reads and advances . The indices chase each other around the ring, reusing freed slots, so both operations stay with no shifting and no wasted scanning.4 (When the buffer fills, resize and re-lay-out into a larger ring at amortized , exactly as for the dynamic array.)

A circular buffer of capacity holding four elements: and chase each other around the ring, and advancing past slot wraps to slot .

A deque (double-ended queue, pronounced deck) generalizes both: it supports insert and delete at both ends. A deque used at one end only is a stack; used to push at one end and pop at the other, it is a queue, so the deque subsumes everything in this lesson. A doubly linked list with a head and tail sentinel implements a deque directly, and a circular buffer with both indices movable in either direction does too; the Design Circular Deque problem asks for exactly the latter.

Takeaways

  • Every container is either contiguous (an array — random access by address arithmetic, cache-friendly, to splice) or linked (nodes joined by pointers — splice given the node, to index, a pointer of overhead per element). The choice is a trade, not a winner.
  • A dynamic array appends in amortized by doubling capacity on overflow: the aggregate copy cost over appends is . The worst-case single append is still on the resize step.
  • A doubly linked list inserts and deletes in given the node; a sentinel node erases the boundary cases so delete is two pointer writes.
  • A stack is LIFO (, all ) and underlies the call stack, DFS, expression evaluation, and bracket matching.
  • A queue is FIFO, implemented as a circular buffer with and indices advanced for ends; the deque generalizes both stack and queue to operations at either end.

Footnotes

  1. Skiena, §3.1–3.2, Contiguous vs. Linked Structures: the array-vs-pointer trade-off and its consequences for access, splicing, and locality.
  2. CLRS, Ch. 10, Elementary Data Structures (with the amortized analysis of Ch. 16): geometric doubling gives amortized table append.
  3. CLRS, Ch. 10, Elementary Data Structures (§10.2): doubly linked lists and the sentinel that removes boundary cases from insert/delete.
  4. CLRS, Ch. 10, Elementary Data Structures (§10.1): stacks and the circular-array queue with head/tail indices taken .
Practice

╌╌ END ╌╌