cons
Part of the cognition and methodology series. Builds on Functor Wizardry and The Natural Framework.
A pipeline that doesn’t close degenerates. Each run produces episodes that vanish. The procedures calcify. Bugs that got fixed stay fixed, but the method that fixed them never sharpens. You’re running the same spells on harder problems with the same apprentice-level skill. The pipe leaks where experience should become knowledge.
Functor Wizardry left that joint open. The pipeline composes, but the circle doesn’t close — no epmem → smem morphism, no monoid. Here’s how it closes.
126 turns, 4 days
I worked on Soar across 24 sessions. The consolidation harness logged every turn — 21,000 action records total, 1,369 in Soar sessions, 126 with Soar-relevant intent. Here’s what those 126 turns actually did:
Perceive the history. Load action records — tool sequences, prompt intents, timestamps. 21,000 entries. Grep for Soar-relevant turns: 126 survive.
Filter the noise. TF-IDF on prompt tokens drops the “yes” and “sure” confirmations, the deploy cycles, the PageLeft crawls. What survives is the arc: research → diagnose → prescribe → implement → write.
Attend the arc. Rank the survivors into phases, identify the decision points a human can’t skip:
- Research (Mar 19–23): fetch arxiv papers, read Soar source, email Laird. 40+ turns of WebFetch → Read(.pdf) → Grep → Read(.cpp).
- Diagnose (Mar 23–25): map to framework roles, substantiate against source code. Read(.md) → Agent → Grep → Edit(.md).
- Prescribe (Mar 25): look up parts bin, propose candidates. Read(.yml) → Edit(.yml) → Edit(.md).
- Implement (Mar 25): draft PRs against Soar repo. Read → Agent → Bash(gh).
- Write up (Mar 25–27): draft posts, humanize, codex review. Edit(.md) → Skill(humanize) → Skill(codex) → Bash(deploy).
Five phases. Five human decisions: what’s broken, which diagnosis, which algorithm, which PR, which framing. Everything between decisions is tool sequences that repeat.
Remember the result: this post and the skill spec it implies. Both prose with contracts.
Consolidate: compress the arc into a pipeline that runs next time. The test isn’t scientific — it’s a demo. The checkpoints are defined, the intermediate documents are identified, and each one only has to satisfy the postcondition that feeds the next step. A consolidated “diagnose foreign system” skill would run:
- Research (automated) → checkpoint: human picks what’s broken → diagnosis document
- Diagnose (automated) → checkpoint: human confirms role mapping → prescription document
- Prescribe (automated) → checkpoint: human selects algorithm → spec document
- Implement (automated) → checkpoint: human reviews PR → code
- Write up (automated) → checkpoint: human edits framing → published post
Each checkpoint produces a prose document. Each document compresses the sessions that preceded it. Each only needs to be good enough for the next phase. The quality bar is the contract, not perfection.
126 turns, 4 days, every turn requiring human attention → 5 checkpoints, human attention on high-level views only. That’s the compression. Converge at the contract, not at the surface.
Skills and blog posts are the same data type — a skill says “when you see X, do Y,” a post says “when you see X, here’s why Y works.” The skills directory and the grimoire are the same cache. cons adds a new element. The list grows, the algebra stays uniform.
Closing the loop
The loop closes when the output changes the input. A new skill means future sessions run differently. The action store shifts. Next time the pipeline runs, it perceives a different distribution.
If the skill worked, the old pattern vanishes — absorbed. If it didn’t, the old pattern persists and the pipeline proposes again. The pipeline is its own error signal.
- Cycle 0: No skills. 126 raw turns. Pipeline extracts the five-phase arc.
- Cycle 1: Diagnosis skill installed. Next foreign codebase: 5 checkpoints instead of 126 turns. Pipeline perceives the compressed distribution.
- Cycle n: Skills compose. The pipeline finds higher-order patterns — sequences of skills, not sequences of tools. Each cycle’s proposals reflect the current library, not the original raw logs.
Each cycle compresses. The data processing inequality guarantees each has strictly less to work with. The cycles converge.
The monoid
Without the epmem → smem morphism, the three stores don’t compose and you don’t get monoidal structure. With it:
- epmem → smem: 126 turns become a five-phase arc. Episodes become patterns.
- smem → pmem: The arc becomes a skill spec with five checkpoints. Patterns become procedures.
- pmem → epmem: Run the skill on the next codebase. Procedures produce new episodes.
Composition is associative. The identity is null consolidation — nothing learned, cache unchanged. The monoid closes.
What comes free
Idempotency. Run the full cycle twice on the same action store, same result. The pipeline converges to the skill set that absorbs all recurring patterns. Two iterations to convergence, same as the slop-detection finding.
Self-correction. A bad skill creates new recurring patterns — workarounds. Next cycle detects them, proposes replacements. The error signal is the action store itself.
Compositionality. Skills compose because they’re endofunctors on the same cache. The pipeline detects composed sequences as new patterns and proposes higher-order skills. Composition emerges from the monoid, not from explicit design.
Time compression. 126 turns across 4 days → 1 run with 5 human checkpoints. The human attention compresses from every turn to five decisions. Everything between checkpoints is mechanical. That’s the concrete payoff: not “the math is elegant” but “the next diagnosis takes hours instead of days.”
The harness does Filter without Attend — it rejects noise but can’t select signal. Which candidates become real skills is still my call. The pipeline delivers candidates; I decide which ones ship. The monoid handles the compression, the human handles the judgment.
That division is load-bearing. Automate it and you have a monad — the system improving its own improvement process. That’s a different post.
Written via the double loop. This post is its own demo: the conversation that wrote it performed perceive → filter → attend → remember on the Soar episodes, and the result is the consolidation record you just read.