The Parts Bin
Part of the cognition series. Builds on The Handshake.
Near-misses
The Natural Framework derives six roles from temporal flow and bounded storage. The Handshake gives each step a contract: precondition and postcondition. The CS textbook is full of operations that almost satisfy them. Almost is the diagnosis.
PageRank is the first specimen. Its postcondition says “ranked by authority” but the Attend contract requires diversity and a bound. It made a lot of money regardless. A broken step doesn’t kill instantly; it compounds.
Google bolted on re-ranking, topic diversity, and freshness signals over two decades: incremental upgrades toward a contract-preserving morphism, one patch at a time.
Quicksort is the second specimen. It satisfies order, the most visible guarantee. But the Attend contract also requires diversity (survivors are dissimilar) and boundedness (output is finite top-k, not a total order). Quicksort is the default because order is the only guarantee most systems measure.
Most familiar algorithms are near-misses. They satisfy some guarantees but not all. The formal test is iteration stability: run the full loop — Perceive through Remember and back — and observe which postcondition degrades. Diversity is the guarantee iteration kills first, because without repulsion between winners the same cluster dominates every cycle. Noticing which guarantee fails under iteration: that’s the diagnostic power.
Degenerate cases are the other edge. A chatbot has no policy store: nil Filter, nil Attend, nil Consolidate, nil Remember. Token in, token out, same rate. That is passthrough, predicted by the existence proofs when policy is zero.
Union-find has no selection: every element is kept, partitions only merge, the structure grows monotonically. No competitive core, no lossy step, no compression. Both are useful. Neither learns.
Diagnostic resolution
The monad is the container — it doesn’t break. What breaks is a morphism’s postcondition. What cascades is the composition. The derivation forces contracts of this shape, and the induction proof shows they compose forever if they compose once.
Step N+1’s precondition is step N’s postcondition. A correct algorithm with a broken precondition is a correct algorithm that produces garbage.
Governance operates at zero resolution. It can’t see the interface, so it replaces the whole composition: fire everyone, rewrite from scratch, new system. Expensive. The framework increases resolution: instead of “the system is broken,” the diagnosis is “Attend’s diversity guarantee is missing.” The parts bin increases it further: instead of “Attend is broken,” the prescription is “this is a top-k sort where you need MMR re-ranking.” Swap one operation, same slot, contract restored.
The most efficient fix requires the most precise name. The most feared enemy is one who cannot be named. The handshake is a naming system.
Diagnosis LLM took a couple of hours. Three layers, six steps each, SOAP notes with six-component plans. That precision came from the framework, not domain expertise in ML. Before germ theory: “the patient has bad air.” After: “streptococcus, here’s penicillin.” The framework is the microscope. The handshake is why the microscope works.
Agent
The framework is the diagnostic manual. The parts bin, once ordered, is the pharmacy. The handshake is why the prescriptions compose. What would it look like to use them systematically?
Describe. A product manager says: “users sign up but never come back.” An agent maps this to the six steps. Cache works. Users arrive and data is stored. Filter is missing. Users get everything, keep nothing. Consolidate is nil. Nothing changes between sessions.
Diagnose. The agent isolates Filter. Then drills deeper: is it the precondition (wrong input from upstream), the operation (wrong mechanism), the postcondition (contract not satisfied), the fidelity (contract satisfied but too lossy), or the scale (right operation, wrong timescale)?
Prescribe. The agent queries the taxonomy. Filters by: matching precondition, matching postcondition, sufficient fidelity, compatible scale. Returns a ranked list of candidate operations from across domains. “Your Filter slot needs: output strictly smaller, criterion applied uniformly. Candidates from the parts bin, ordered by fidelity and cost.”
Validate. The agent checks that the prescribed operation’s postcondition matches the next step’s precondition. If not, it flags the interface mismatch before you build it.
The doctor doesn’t need to understand category theory. They need to read the contracts: precondition, postcondition, fidelity. The handshake is the pharmacology. The taxonomy is the PDR. The agent is the resident who can look things up fast.
Catalog
The full catalog follows. Each entry is an operation: input in, output out. If the precondition and postcondition match the contract, the operation fits the slot. Filter decides per-item admissibility: does this item pass the criterion? Attend decides how admitted items relate to each other: order, diversity, and bound are slate-level properties.
Perceive (raw → encoded) — the column every system gets right, because nothing else works until it does.
| Operation | Precondition | Postcondition |
|---|---|---|
| Lexical analysis | Raw byte stream | Token sequence, parseable |
| Parsing (LL/LR) | Token stream, conforms to grammar | AST with explicit structure, traversable |
| JSON parsing | Raw string, well-formed | Structured object, addressable by key |
| A/D conversion | Continuous analog signal | Discrete samples, quantized |
| SIFT descriptor extraction | Raw pixel grid | Keypoint descriptors, matchable |
Cache (encoded → indexed) — the most studied column. Idreos (2018) built a periodic table from five design primitives.
| Operation | Precondition | Postcondition |
|---|---|---|
| Hash indexing | Records with stable keys | Keyed index, exact retrieval by key |
| B-tree index construction | Records with ordered keys | Balanced index, retrieval + range queries |
| Trie insertion | String keys over finite alphabet | Prefix-indexed, retrieval by string or prefix |
| Inverted index construction | Tokenized corpus with document IDs | Posting lists, retrieval by term |
| LSM-tree flush | Sorted runs in memory | Persistent key-value index, retrievable after compaction |
| Skip-list indexing | Ordered entries | Probabilistic index, O(log n) retrieval |
Filter (indexed → selected, strictly smaller) — gates the data store, where most systems use exact predicates. The derivation proves a gate must exist whenever outputs are a proper subset of inputs.
| Operation | Precondition | Postcondition |
|---|---|---|
| Predicate selection (WHERE) | Indexed relation + boolean predicate | Subset matching predicate, strictly smaller |
| Range query | Ordered index + interval bounds | Subset within interval, strictly smaller |
| Threshold filtering | Scored items + threshold t | Subset meeting threshold, strictly smaller |
| Regex extraction | String corpus + pattern | Matching spans retained, non-matches discarded |
| k-NN radius pruning | Metric index + query + radius r | Subset within radius, strictly smaller |
| Pareto filtering | Candidates with objective vectors | Non-dominated subset, strictly smaller |
Attend ((policy, selected) → ranked, diverse, bounded) — reads the policy store: given the survivors, which are worth pursuing? Policy is a function; it routes data. Control separates from data (derived). Most ranking algorithms satisfy order but miss diversity and bound.
| Operation | Precondition | Postcondition |
|---|---|---|
| MMR re-ranking | Candidates + relevance scores + similarity measure | Top-k ordered, diversity penalized, bounded |
| DPP top-k selection | Candidates + relevance weights + similarity kernel | Top-k ranked, mutually dissimilar, bounded |
| xQuAD re-ranking | Candidates + relevance + subtopic coverage | Top-k ordered, aspect coverage explicit, bounded |
| Submodular maximization | Candidates + submodular utility (relevance + coverage) | Top-k greedy-ranked, diminishing-return diversity, bounded |
| Diversified beam search | Stepwise expansions + diversity penalty | Top-b retained, non-redundant alternatives, bounded |
Near-misses (diagnostic counterexamples):
- Quicksort / Mergesort: order only. No diversity, no bound.
- Top-k selection: bounded, no diversity.
- PageRank: ranking, no diversity, no bound.
Consolidate (persisted → policy′) — the backward pass. Reads from Remember (which caches the ranked output) and writes to the substrate, reshaping how each forward stage processes on the next cycle. Contains its own inner loop: mutate, score, select. The outer pipe cannot observe this loop directly — it can only notice that Attend’s behavior changed. Babbage’s Difference Engine (1822) is the mechanical precedent: take a sequence of values, compute successive differences, extract the generating polynomial. Ranked examples in, compact rule out. The interface is Consolidate’s type signature; Babbage built it without Attend to feed it.
I-Con (2025) built a periodic table for this column. A blank cell predicted a new algorithm that beat the state of the art.
| Operation | Precondition | Postcondition |
|---|---|---|
| Gradient descent update | Loss contributions + current weights | Weights updated, future predictions altered |
| Bayesian posterior update | Prior parameters + weighted observations | Posterior compressed, future inference altered |
| K-means update | Weighted points + codebook size k | k prototypes replacing many points, lossy |
| Incremental PCA | Observations in high dimension | Low-rank basis, future projection altered |
| Decision tree induction | Ranked labeled examples | Compact rule set, future classification altered |
| Prototype condensation | Ranked candidates + compression budget | Small exemplar set, lossy approximation for future matching |
Remember (ranked → persisted) — the last forward stage. Lossless relative to its input: no additional loss at this step. Remember also serves as the cache for Consolidate: ranked outcomes are stored here, and Consolidate reads from them asynchronously. Remember is not a separate store. It is the historically shaped substrate, the part of the medium that carries the system’s past forward. A database row is Remember for the database pipe but Cache for the CRM pipe. A log entry is Remember for the logger but Cache for the monitoring pipe.
If the thing being persisted is a representation rather than the final entity, it’s Cache at this level, not Remember. The discipline: list write operations only.
| Operation | Precondition | Postcondition |
|---|---|---|
| WAL append + fsync | Serialized state record | Durable on crash, recoverable next cycle |
| Transaction commit | Validated write set | Persisted, visible for future reads |
| Git object write + commit | Content-addressed objects + manifest | Durable commit graph, retrievable by hash |
| Checkpoint serialization | In-memory model/state | Persisted checkpoint, loadable on next run |
| Copy-on-write snapshot commit | Consistent compressed state image | Persistent snapshot, addressable by version |
| SSTable flush | Immutable key-value run in memory | Durable on-disk run, retrievable by key |
Grid
The catalog is a list. A list lets you browse. Browsing doesn’t scale. You need an index. The index needs axes.
Take Filter. Two axes, selection semantics vs. error guarantee:
| Exact | Bounded approximation | Probabilistic | |
|---|---|---|---|
| Predicate | WHERE, range query | Threshold filtering (soft margin) | Bloom filter |
| Similarity | Exact NN pruning | k-NN radius pruning | LSH filtering |
| Dominance | Pareto filtering | ε-dominance filtering | Stochastic dominance |
Every cell fills with a known algorithm. Bloom filter at predicate × probabilistic — the most deployed probabilistic data structure in computing. ε-dominance at dominance × bounded — standard in multi-objective optimization. Stochastic dominance at dominance × probabilistic — decision theory since the 1960s. A clean partition with no awkward fits validates the axes, but a 3×3 where every cell fills on sight is a confirmation, not a prediction.
Take Attend. Lay operations on output form vs. redundancy control:
| None | Implicit | Explicit | |
|---|---|---|---|
| Top-k slate | Heap top-k | Beam search | MMR, DPP top-k, xQuAD |
| Single best | argmax | Tournament selection | Simulated annealing, CMA-ES |
| Path/tree | Dijkstra, A* | MCTS | Portfolio solvers |
The right column filled late. CS built ranking algorithms for decades and almost never baked redundancy control into the postcondition. It was bolted on after. Simulated annealing and CMA-ES find single optima by explicitly diversifying the search: temperature schedules force basin-hopping, covariance matrices enforce spread. Portfolio solvers spawn threads with different random seeds; stochasticity encourages divergence, budget kills at deadline, final selection picks the best from a diverse pool. Biological evolution does this with mutation rate as the stochastic dial.
The grids organize the catalog. The axes partition the design space cleanly enough to prescribe: given a broken slot, name the coordinates, look up the candidate. Whether finer grids can do more — find blank cells, predict operations that haven’t been composed yet — is the next question.
Future work
The derivation establishes contracts. The catalog and grids give the pharmacy. Two things are ready now:
- Build the diagnostic agent. Describe → Diagnose → Prescribe → Validate, backed by the ordered taxonomy.
- Find the missing parts. The 3×3 grids confirm the axes but don’t predict. Finer grids with more rows surface genuine blanks — operations that should exist but haven’t been composed yet.
Three need formalization work. The existence proofs and types are defined; the composition proofs are sketched:
- Formalize in Haskell: define the six morphisms as Kleisli arrows in the Giry monad, test composition with QuickCheck
- String diagrams: visualize the pipeline in Stoch using the Markov category graphical calculus (Cho & Jacobs 2019)
- Enrichment: track bits per step using Bradley’s enriched category framework (Bradley 2021)
One needs new theory. The derivation is for linear pipelines only:
- Operads (Spivak 2013): extend from linear pipelines to fan-in/fan-out agent architectures
Every gap in the bin is an almost that hasn’t been named yet.
Written via the double loop.