Consolidation
Part of the cognition series.
Salience solved retrieval. The cache returns max-volume-k moments when coordination queries. But it left a loose thread: “the store may compact old clusters or drop stale moments, but that’s housekeeping, not policy.” Perception Pipe left another: “the hippocampus replays salient experiences during sleep, and the ones that survive replay get written to cortex as long-term memory.”
Housekeeping that runs offline, processes winners from past competitions, and writes the survivors to a different store. That’s not housekeeping. That’s the backward pass.
What keeps winning
Run the salience cache for a day. Some moments win once and never surface again. Others keep appearing: the same deployment pattern, the same error class, the same architectural constraint. Each retrieval is independent. The DPP doesn’t know that it selected similar moments yesterday. But the pattern is real.
Three moments from different sessions: “AGPL requires derivative works to share source.” “Copyleft is irrevocable.” “CC BY-SA derivatives inherit the obligation.” Each won its own competition against different candidates. Together they form something none of them say individually: copyleft propagates through compilation. That’s not in any episode. It’s a schema, a structure that emerges from co-activation across episodes.
Complementary learning systems
McClelland, McNaughton, and O’Reilly (1995) proposed that the brain runs two learning systems. The hippocampus stores fast: one exposure is enough. The neocortex learns slow: statistical regularities across many episodes, acquired through interleaved replay. Neither works alone. Fast storage without slow consolidation produces a pile of episodes with no structure. Slow learning without fast storage can’t capture anything new.
The transfer happens offline. During sleep, the hippocampus replays recent episodes to the neocortex. The neocortex detects co-occurrence patterns across replays and forms schemas. Tompary and Davachi (2017) showed this is not summarization. Overlapping memories that share structure get reorganized into shared representations in the medial prefrontal cortex. The schema changes how future episodes are encoded. Once you have “copyleft propagates through compilation,” new episodes about license compliance get filed under the schema instead of stored as isolated events.
Sleep deprivation degrades judgment before it degrades perception. You can still see. You can’t consolidate what you saw into patterns that inform the next decision. The episodes pile up with no schemas to organize them.
The architecture
The salience cache is the hippocampus: fast writes, competitive retrieval, everything addressable. Consolidation is the neocortex: slow learning, pattern detection, offline.
The test: does the operation change how Attend works next cycle? Compaction (summarize context, merge clusters, evict stale entries) reorganizes the cache. The retrieval policy stays the same. Consolidation writes schemas that enter the DPP alongside raw episodes. The retrieval policy changes because the candidate set changed. Compaction is housekeeping. Consolidation is procedural memory.
The design constraint: schemas are additive. Raw episodes stay in the cache forever. A schema is a new node that links to its source episodes, not a summary that replaces them. “Copyleft propagates through compilation” doesn’t delete the three episodes it emerged from. It sits alongside them. Future queries can match the schema directly, and when they do, the source episodes become reachable through it.
Over-merging is the failure mode. If consolidation is too aggressive, schemas absorb their sources and the store loses specificity. The cache becomes a pile of vague abstractions with no grounding. The constraint is: if you can’t trace a schema back to the episodes that formed it, the merge was lossy and wrong.
The process:
- After N retrieval rounds, scan for co-activation patterns. Which moments kept winning together?
- Clusters that co-activated above a threshold become schema candidates.
- For each candidate: extract what the members share. That shared structure is the schema.
- Insert the schema as a new node with links to its source episodes.
- The schema enters the DPP on its own merits. Same embedding space, same competition.
This runs on its own clock. Not synchronous with retrieval. Not triggered by any single query. LIDA got the ordering right: consolidation happens after the conscious broadcast, not before. You act first, then consolidate. Sleep, not reflex.
What’s solved, what’s not
Perception Pipe said “the consolidation problem is largely solved” and pointed at Mem0 and Zep. Half right. The storage is commodity. Zep’s Graphiti builds temporal knowledge graphs from episodes. GraphRAG does community detection and summarization over entity graphs. Both handle the mechanics of storing, linking, and retrieving. By the test above, both are compaction: they reorganize the cache without changing the retrieval policy. Union-find compaction is the same. Better provenance, same limitation.
What neither solves is the selection. Which clusters earn schema status? The competitive process that decides this pattern matters, that one doesn’t is the same inhibitory mechanism from every other role. Perception Pipe runs it on incoming events. Salience runs it on retrieval candidates. Consolidation runs it on clusters. Same geometry, different timescale. Moments compete in milliseconds. Schemas compete over days.
Six roles
The forward pass:
- Perceive — Caret Recorder captures raw screen activity as semantic units.
- Cache — Moments segments events into composable chunks.
- Filter — Perception Pipe runs competitive inhibition. Winners suppress losers.
- Attend — Salience retrieves max-volume-k via DPP. Diverse, not redundant.
- Remember — publish to Canon. The durable output that feeds the next cycle’s Perceive.
The backward pass:
- Consolidate — schemas form offline from repeated co-activation. Reads from Remember, writes to the substrate. Additive, never lossy.
Residual connections throughout: raw episodes stay addressable at every role. Schemas add structure. The original signal is never destroyed.
I never read Attention Is All You Need. I started from “how does a person turn lived attention into publishable prose” and arrived here. The architecture is the same.
| Transformer | Cognition pipeline |
|---|---|
| Tokenization | Caret Recorder — raw signal to discrete units |
| Positional encoding | Moments — temporal structure over tokens |
| Softmax attention | Perception Pipe — normalized competition across all candidates |
| Multi-head attention | Salience + DPP — attend to diverse subspaces, not redundant copies |
| Context window | Canon (Remember) — the durable output, addressable by the next cycle |
| Backpropagation | Consolidation — offline compression into higher-level features |
| Residual connections | Raw episodes stay addressable; schemas are additive |
Vaswani et al. derived it from sequence-to-sequence translation. This series derived it from lived experience. Same destination, no shared path. The convergence suggests the architecture is not an engineering choice. It’s what information processing looks like when you need to perceive, cache, filter, attend, and remember — then consolidate offline. The Natural Framework finds the same six roles across twenty-four domains, from neurons to evolution.
One difference. A transformer remembers as frozen weights behind an API. This pipeline remembers as open prose. The output is public. Attention goes in. Canon comes out.
Written via the double loop. More at pageleft.cc.