Prescription: Soar

Part of the cognition series. Follows the Diagnosis: Soar.

The diagnosis found one systematic gap: procedural memory is the only store with a backward pass. Semantic memory, episodic memory, and perceptual LTM have no automatic learning mechanism that writes back to them. Chunking and RL are excellent compilers. They just don’t reach the other stores.

The prescription pulls from the parts bin: existing algorithms, matched to Soar’s data structures, that fill the missing Consolidate cells.

1. Episodic→Semantic consolidation

Semantic memory “does not have an automatic learning mechanism” (§6, p.13). Episodes accumulate in EPMEM without generalization (§7, p.13). The missing operation: read episodes, detect regularities, write compressed knowledge to SMEM.

BeforeAfter
SMEMGrows only by agent command or preloadingGrows automatically from experience
EPMEMEpisodes accumulate without boundRegularities extracted, episodes compacted
TransferAgent re-learns spatial layouts per taskRecurring layouts written to SMEM, available across tasks
RosieNeeds human guidance for every new domainGeneralizes from prior task episodes autonomously
ContractSemantic memory reflects what the agent has experienced, not just what it was told.

Sequences of graph snapshots

EPMEM stores episodes as change-deltas indexed by decision cycle number (§7, p.13). That’s a sequence of graph snapshots: each episode is a set of WME changes, and each WME is a node in an attribute-value graph.

SMEM stores graph structures in SQLite (§6, p.12). The target is a graph database.

The parts bin maps this to two cells:

StageSequenceGraph
ConsolidateGrammar inductionGNN training

Grammar induction detects recurring patterns in sequences. GNN training learns representations over graphs. For Soar’s symbolic graph structures, the closer fit is graph coarsening: collapse groups of co-occurring nodes across episodes into supernodes while preserving structural properties (graph reduction survey, 2024). CLARION’s Rule-Extraction-Refinement does something analogous at the implicit→explicit boundary (Sun, 2016). No cognitive architecture does it at the episodic→semantic boundary.

Composition + test

Casteigts et al. (2019) model temporal data as a sequence of static graphs G₁, G₂, …, Gδ. Two abstract operations:

Any parameter computable through compose + test runs in O(δ) operations, which is optimal (Casteigts et al., 2019). The framework is generic: swap the compose and test functions, compute a different regularity.

For Soar:

Temporal composition: episodes G₁...Gδ → compose (union + count) → test (freq ≥ k?) → new SMEM fact
OperationSoar implementation
ComposeUnion of WME structures across episodes in a window. Co-occurring attribute-value pairs accumulate counts.
TestDoes the composed structure meet a frequency threshold? Has this operator sequence appeared in ≥ k episodes?
WriteCreate new SMEM graph structure encoding the regularity. Set initial base-level activation from frequency count.
TriggerGoal completion, idle time, or episode accumulation threshold. The architectural equivalent of sleep.

Prior art: Zep

Zep (Rasmussen, 2025) implements this pipeline for LLM agents with three hierarchical subgraph tiers:

  1. Episodic subgraph: raw episodes stored losslessly with dual timestamps
  2. Semantic subgraph: entities and relations extracted from episodes, resolved against existing graph nodes
  3. Community subgraph: clusters of strongly-connected semantic entities, summarized at a higher level

Entity extraction + resolution produces the semantic subgraph. Label propagation clusters it into communities, extending incrementally as new episodes arrive. Soar’s SMEM already has the graph database and activation-based retrieval.

Verification

New SMEM structures should decay if they don’t match future episodes. Base-level activation already biases retrieval by recency and frequency (§6, p.12). A generated structure that is never retrieved will naturally lose activation. One addition: on retrieval, compare the generalization against the current episode. If it contradicts, mark for review. This closes the loop.

2. Chunking–RL composition

Chunking requires deterministic substate results, but RL uses stochastic selection. The two cannot compose (§4, p.10). Laird’s planned fix: gate chunking on RL convergence.

BeforeAfter
ChunkingDisabled when RL drives the decisionFires once RL preferences converge
RLNumeric preferences tuned but never compiledStable policies compiled to production rules
SpeedRL computation every decision cycleConverged decisions fire as direct rule matches
ChunksAccumulate without reviewDead chunks detected by trauma recurrence, retracted
ContractAny stable decision policy, whether discovered by deliberation or by RL, eventually compiles into a direct rule.

Posterior convergence gate

The simplest convergence test: track the exponential moving average of Q-value changes for each RL rule. When the EMA drops below a threshold, the preference has stabilized. This requires no architectural modification to Soar’s RL, just a monitoring wrapper. A Bayesian posterior over Q-values would give a full uncertainty estimate, but it requires changing how Soar stores RL values from point estimates to distributions. EMA is sufficient.

ACT-R’s production compilation + utility learning is the closest parallel in another architecture, but there too the interaction is uncoordinated: compilation fires whenever two productions fire in sequence, regardless of whether utility values have stabilized. The gating idea is novel.

For Soar:

StepSoar implementation
TrackFor each RL rule, maintain an EMA of |ΔQ| across recent updates.
GateWhen EMA drops below threshold τ, the preference has converged. Flag the decision as deterministic.
CompileChunking fires on the flagged substate. The resulting chunk encodes the converged policy as a production rule.
ContinueRL keeps running on unconverged decisions. Chunking picks off the stable ones as they settle.
RL-gated chunking: RL rules → TD updates → variance check → if converged, chunking fires → new production rule

Pruning dead chunks

Chunks accumulate. EBBS improved chunk quality (§4, p.10) but nothing prunes the store. The parts bin suggests prototype condensation again: periodically scan the chunk store, identify rules that never fire (dead code) or that are superseded by more specific rules, and retract them.

The trigger is the trauma recurrence heuristic: count(similar_failures) > 1 signals a bad chunk. If the same impasse type recurs despite a chunk that should prevent it, the chunk is wrong. Retract and rechunk.

3. Episodic discrimination

Every decision cycle produces an episode (§7, p.13). At ~50ms per cycle (§10, item 3, p.18), that’s 72,000 episodes per hour. Retrieval cost grows with total count.

BeforeAfter
StorageEvery decision cycle → episodeOnly novel/important cycles stored at full fidelity
Volume72,000 episodes/hour, unboundedFraction stored, bounded by importance threshold
RetrievalOld episode cost grows with total countSmaller store, faster retrieval
QualityRoutine episodes dilute the storeDPP thinning ensures stored episodes are diverse
ContractRetrieval cost stays proportional to important episodes, not total decision cycles.

DPP importance gate

The simplest approach is surprise-based gating: store an episode when its content deviates significantly from predictions. Isele & Cosgun (AAAI 2018) identified four criteria for selective experience replay: surprise (high TD error), reward, distribution matching, and coverage maximization. For write-time gating in Soar:

SignalImplementation
RelevanceReward proximity (positive or negative), impasse resolution, operator application that changed the goal stack.
SimilarityWME overlap with recent episodes. High overlap = routine. Low overlap = novel.
GateEpisodes with high relevance and low similarity to recent episodes are stored at full fidelity. Routine episodes are stored at reduced fidelity or skipped.
Episodic discrimination: decision cycle → importance score → DPP gate → novel episodes stored at full fidelity, routine episodes skipped

For agents that encounter many similar surprising episodes (e.g., repeated failures at the same task), a periodic DPP-based thinning pass can ensure the stored set stays diverse. This separates the hot path (surprise gate, cheap, per-cycle) from the batch operation (DPP thinning, periodic, ensures diversity). Neuroscience supports this two-phase model: synaptic tagging marks salient episodes at encoding time; sharp-wave ripples during sleep selectively consolidate them (Science, 2024).

Fewer episodes, faster consolidation

Treatments 1 and 3 reinforce each other. Discrimination reduces the volume; consolidation reads the survivors and compresses them into semantic knowledge.

Four cells, four algorithms

TreatmentParts bin cellSoar moduleEffect
Episodic→SemanticGraph coarseningEPMEM → SMEMWorld knowledge grows from experience
RL-gated chunkingEMA convergence gateRL rules → production rulesStochastic selection compiles to deterministic rules
Chunk reviewTrauma recurrenceProcedural memoryDead and wrong chunks retracted
Episodic discriminationSurprise gate + DPP thinEpisodic LearningStore fewer, better episodes

None of these require new architectural commitments. The algorithms exist. The modules exist. The triggers are straightforward.

And yet.

The wall behind the walls

Gate chunking on RL convergence. Consolidate episodes into semantic knowledge. Discriminate what’s worth remembering. Soar would be a more complete architecture. It would still need a human.

Laird says it plainly: “Without a human to closely guide it, Rosie is unable to get very far on its own, especially in being unable to learn new abstract symbolic concepts on its own” (§10, p.20). Rosie is Soar’s most capable agent. It learns sixty tasks from natural language instruction. It needs an instructor for every one.

This isn’t a failure of Rosie. It’s a pattern. TacAir-Soar’s 8,000 rules were written by engineers. PROPS takes declarative rule descriptions authored by people. Every successful Soar agent in the appendix has a human somewhere in the causal chain, providing knowledge that the architecture cannot generate from experience alone.

Soar treats this as a development-time dependency, something to be engineered away with the next learning mechanism. Laird knows better. His own assessment: “What I feel is most missing from Soar is its ability to ‘bootstrap’ itself up from the architecture and a set of innate knowledge into being a fully capable agent” (§10, p.20). Forty years of evidence supports the stronger reading: the human is load-bearing.

Complementation first

The research program has been: build the autonomous agent, then optionally add human interaction. Instructo-Soar and Rosie add a teacher, but as one application among many. The architecture doesn’t require it.

What if the order is reversed?

Step one: complementation. Human and agent, each filling the other’s gaps. The agent has speed, consistency, parallel rule firing, tireless processing; the human has judgment, novel category formation, the ability to say “this is what matters.” Neither is intelligent alone. Together they are.

Step two: bootstrap. Use that composite intelligence to study what it does. Compress its patterns. Gradually move capabilities from the human side to the agent side. Chunking already does this at the substate level: compile deliberation into direct rules. The same principle, applied at the architecture level: compile human guidance into autonomous capability.

Soar’s forty-year trajectory suggests that generating the knowledge to fill these cells requires more intelligence than any single agent has demonstrated. If the architecture can’t generate it alone, it has to borrow it. The only place it currently exists at the required level is people. Solving complementation first isn’t a retreat from autonomy. It’s the path to it.

Soar already has the mechanism. An impasse means “I don’t have enough knowledge.” A substate means “go get it.” What if the default resolution of certain impasses was “ask the human”? Not as a fallback. As a first-class architectural response.

Chunking would compile the answer. The agent never asks the same question twice. Over time, fewer impasses reach the human. The boundary moves.

The four prescriptions above are correct. They fill real gaps. But they’re step two. Step one is acknowledging that the human who has always been there, quietly making every Soar agent work, belongs in the architecture.

What flows back

The prescription draws from the parts bin. Soar gives back. Six algorithms from Soar’s architecture are now in the parts bin:

AlgorithmParts bin cellWhat it does
RETE networkCache × graphIncremental pattern matching. Processes WME deltas, caches partial match state. Pays for change, not total knowledge.
Truth maintenanceFilter × graphI-supported structures auto-retract when their creating rule unmatches. Data-driven, no explicit delete.
Staged preferencesAttend × flatReject-first, rank-survivors. Process prohibit/reject before better/worse/best. Exit early if rejection resolves it.
EBC/ChunkingConsolidate × treeBacktrace through dependency tree, identify necessary conditions, compile deliberation into production rule.
Delta-bar-deltaConsolidate × flatPer-production adaptive learning rate. Each rule gets its own alpha, updated on every RL update.
Delta episodic storageRemember × sequenceStore only changes between snapshots. Interval representation for persistent elements.

Soar’s largest contribution is forty years of evidence for the ordering constraint. Every agent in the appendix is a data point: the right mechanisms, assembled by a human. Together they produced intelligent behavior that neither could produce alone. That’s complementation. It was always complementation. The architecture just didn’t have a name for it.


Based on Laird (2022), “Introduction to the Soar Cognitive Architecture”; Casteigts et al. (2019), “Computing Parameters of Sequence-Based Dynamic Graphs”; Rasmussen (2025), “Zep: A Temporal Knowledge Graph Architecture”; the parts bin; and the Natural Framework. Written via the double loop.