← back to programming

Storage & Garbage Collection

Abelson & Sussman · 1996 · SICP §5.3

Memory is a vector of pairs. Garbage collection reclaims unreachable storage. Stop-and-copy GC compacts live data by copying it to new space, then flipping.

before X X X live free garbage stop-and-copy after stop-and-copy: live data moves, garbage stays behind

Memory as vectors

At the lowest level, memory is a flat array of numbered slots. Lisp pairs become two parallel vectors: the-cars and the-cdrs. Index n across both vectors forms one pair. A pointer is just an index.

Scheme

Representing lists using vectors

A list is a chain of pairs where each cdr points to the next pair (or to the empty list). Walking the list means following cdr pointers through the vector. Linked lists work the same way in hardware: each “next” field is a memory address.

Scheme

Stop-and-copy garbage collector

When free memory runs out, the collector copies all reachable data from “from-space” to “to-space,” then flips. Dead data stays behind. The copy is contiguous, so fragmentation vanishes. Cost scales with live data, not total memory.

Scheme

The broken-heart algorithm

When the collector copies a cell to new space, it leaves a broken heart (forwarding pointer) in old space. If it encounters the same cell again via another reference, it follows the forwarding pointer instead of copying twice. This handles shared structure and cycles.

Scheme

Free list vs compacting collection

An alternative to stop-and-copy is a free list: link all unused cells into a chain, allocate from the head. Simpler, but fragmentation accumulates. Stop-and-copy trades the cost of moving live data for contiguous allocation: every alloc is a pointer bump.

Scheme

Notation reference

Scheme / SICP Python Meaning
(cons a b)(a, b) or [a, b]Allocate a pair
(car p) / (cdr p)p[0] / p[1]First / second element of pair
(vector-ref v i)v[i]Read from vector/array
(vector-set! v i x)v[i] = xWrite to vector/array
broken-heartforwarding dictMarker: cell already copied
from-space / to-spacefrom_* / to_*Two memory half-spaces for GC

Translation notes

Python hides all of this behind the runtime. sys.getrefcount() exposes reference counts; gc.collect() triggers the cycle collector. CPython uses reference counting (immediate collection) plus a generational collector for cycles, not stop-and-copy. The SICP model is closer to what the JVM or Go runtime does with copying/compacting collectors. The vector representation of pairs maps directly to struct-of-arrays layouts in systems programming.

Neighbors

Other reading pages

  • SICP Ch.19 — Register Machine Simulator (the substrate this chapter manages memory for)

Foundations (Wikipedia)

Ready for the real thing? Read SICP §5.3.