Prescription: Soar
Part of the cognition series. Follows the Diagnosis: Soar.
The diagnosis found one systematic gap: procedural memory is the only store with a backward pass. Semantic memory, episodic memory, and perceptual LTM have no automatic learning mechanism that writes back to them. Chunking and RL are excellent compilers. They just don’t reach the other stores.
The prescription pulls from the parts bin: existing algorithms, matched to Soar’s data structures, that fill the missing Consolidate cells.
1. Episodic→Semantic consolidation
Semantic memory “does not have an automatic learning mechanism” (§6, p.13). Episodes accumulate in EPMEM without generalization (§7, p.13). The missing operation: read episodes, detect regularities, write compressed knowledge to SMEM.
| Before | After | |
|---|---|---|
| SMEM | Grows only by agent command or preloading | Grows automatically from experience |
| EPMEM | Episodes accumulate without bound | Regularities extracted, episodes compacted |
| Transfer | Agent re-learns spatial layouts per task | Recurring layouts written to SMEM, available across tasks |
| Rosie | Needs human guidance for every new domain | Generalizes from prior task episodes autonomously |
| Contract | Semantic memory reflects what the agent has experienced, not just what it was told. | |
Sequences of graph snapshots
EPMEM stores episodes as change-deltas indexed by decision cycle number (§7, p.13). That’s a sequence of graph snapshots: each episode is a set of WME changes, and each WME is a node in an attribute-value graph.
SMEM stores graph structures in SQLite (§6, p.12). The target is a graph database.
The parts bin maps this to two cells:
| Stage | Sequence | Graph |
|---|---|---|
| Consolidate | Grammar induction | GNN training |
Grammar induction detects recurring patterns in sequences. GNN training learns representations over graphs. For Soar’s symbolic graph structures, the closer fit is graph coarsening: collapse groups of co-occurring nodes across episodes into supernodes while preserving structural properties (graph reduction survey, 2024). CLARION’s Rule-Extraction-Refinement does something analogous at the implicit→explicit boundary (Sun, 2016). No cognitive architecture does it at the episodic→semantic boundary.
Composition + test
Casteigts et al. (2019) model temporal data as a sequence of static graphs G₁, G₂, …, Gδ. Two abstract operations:
- Compose: combine snapshots over an interval (union of structures, transitive closure of co-occurrence)
- Test: check whether the composed result satisfies a property (recurring structure, stable feature, frequent operator sequence)
Any parameter computable through compose + test runs in O(δ) operations, which is optimal (Casteigts et al., 2019). The framework is generic: swap the compose and test functions, compute a different regularity.
For Soar:
| Operation | Soar implementation |
|---|---|
| Compose | Union of WME structures across episodes in a window. Co-occurring attribute-value pairs accumulate counts. |
| Test | Does the composed structure meet a frequency threshold? Has this operator sequence appeared in ≥ k episodes? |
| Write | Create new SMEM graph structure encoding the regularity. Set initial base-level activation from frequency count. |
| Trigger | Goal completion, idle time, or episode accumulation threshold. The architectural equivalent of sleep. |
Prior art: Zep
Zep (Rasmussen, 2025) implements this pipeline for LLM agents with three hierarchical subgraph tiers:
- Episodic subgraph: raw episodes stored losslessly with dual timestamps
- Semantic subgraph: entities and relations extracted from episodes, resolved against existing graph nodes
- Community subgraph: clusters of strongly-connected semantic entities, summarized at a higher level
Entity extraction + resolution produces the semantic subgraph. Label propagation clusters it into communities, extending incrementally as new episodes arrive. Soar’s SMEM already has the graph database and activation-based retrieval.
Verification
New SMEM structures should decay if they don’t match future episodes. Base-level activation already biases retrieval by recency and frequency (§6, p.12). A generated structure that is never retrieved will naturally lose activation. One addition: on retrieval, compare the generalization against the current episode. If it contradicts, mark for review. This closes the loop.
2. Chunking–RL composition
Chunking requires deterministic substate results, but RL uses stochastic selection. The two cannot compose (§4, p.10). Laird’s planned fix: gate chunking on RL convergence.
| Before | After | |
|---|---|---|
| Chunking | Disabled when RL drives the decision | Fires once RL preferences converge |
| RL | Numeric preferences tuned but never compiled | Stable policies compiled to production rules |
| Speed | RL computation every decision cycle | Converged decisions fire as direct rule matches |
| Chunks | Accumulate without review | Dead chunks detected by trauma recurrence, retracted |
| Contract | Any stable decision policy, whether discovered by deliberation or by RL, eventually compiles into a direct rule. | |
Posterior convergence gate
The simplest convergence test: track the exponential moving average of Q-value changes for each RL rule. When the EMA drops below a threshold, the preference has stabilized. This requires no architectural modification to Soar’s RL, just a monitoring wrapper. A Bayesian posterior over Q-values would give a full uncertainty estimate, but it requires changing how Soar stores RL values from point estimates to distributions. EMA is sufficient.
ACT-R’s production compilation + utility learning is the closest parallel in another architecture, but there too the interaction is uncoordinated: compilation fires whenever two productions fire in sequence, regardless of whether utility values have stabilized. The gating idea is novel.
For Soar:
| Step | Soar implementation |
|---|---|
| Track | For each RL rule, maintain an EMA of |ΔQ| across recent updates. |
| Gate | When EMA drops below threshold τ, the preference has converged. Flag the decision as deterministic. |
| Compile | Chunking fires on the flagged substate. The resulting chunk encodes the converged policy as a production rule. |
| Continue | RL keeps running on unconverged decisions. Chunking picks off the stable ones as they settle. |
Pruning dead chunks
Chunks accumulate. EBBS improved chunk quality (§4, p.10) but nothing prunes the store. The parts bin suggests prototype condensation again: periodically scan the chunk store, identify rules that never fire (dead code) or that are superseded by more specific rules, and retract them.
The trigger is the trauma recurrence heuristic: count(similar_failures) > 1 signals a bad chunk. If the same impasse type recurs despite a chunk that should prevent it, the chunk is wrong. Retract and rechunk.
3. Episodic discrimination
Every decision cycle produces an episode (§7, p.13). At ~50ms per cycle (§10, item 3, p.18), that’s 72,000 episodes per hour. Retrieval cost grows with total count.
| Before | After | |
|---|---|---|
| Storage | Every decision cycle → episode | Only novel/important cycles stored at full fidelity |
| Volume | 72,000 episodes/hour, unbounded | Fraction stored, bounded by importance threshold |
| Retrieval | Old episode cost grows with total count | Smaller store, faster retrieval |
| Quality | Routine episodes dilute the store | DPP thinning ensures stored episodes are diverse |
| Contract | Retrieval cost stays proportional to important episodes, not total decision cycles. | |
DPP importance gate
The simplest approach is surprise-based gating: store an episode when its content deviates significantly from predictions. Isele & Cosgun (AAAI 2018) identified four criteria for selective experience replay: surprise (high TD error), reward, distribution matching, and coverage maximization. For write-time gating in Soar:
| Signal | Implementation |
|---|---|
| Relevance | Reward proximity (positive or negative), impasse resolution, operator application that changed the goal stack. |
| Similarity | WME overlap with recent episodes. High overlap = routine. Low overlap = novel. |
| Gate | Episodes with high relevance and low similarity to recent episodes are stored at full fidelity. Routine episodes are stored at reduced fidelity or skipped. |
For agents that encounter many similar surprising episodes (e.g., repeated failures at the same task), a periodic DPP-based thinning pass can ensure the stored set stays diverse. This separates the hot path (surprise gate, cheap, per-cycle) from the batch operation (DPP thinning, periodic, ensures diversity). Neuroscience supports this two-phase model: synaptic tagging marks salient episodes at encoding time; sharp-wave ripples during sleep selectively consolidate them (Science, 2024).
Fewer episodes, faster consolidation
Treatments 1 and 3 reinforce each other. Discrimination reduces the volume; consolidation reads the survivors and compresses them into semantic knowledge.
Four cells, four algorithms
| Treatment | Parts bin cell | Soar module | Effect |
|---|---|---|---|
| Episodic→Semantic | Graph coarsening | EPMEM → SMEM | World knowledge grows from experience |
| RL-gated chunking | EMA convergence gate | RL rules → production rules | Stochastic selection compiles to deterministic rules |
| Chunk review | Trauma recurrence | Procedural memory | Dead and wrong chunks retracted |
| Episodic discrimination | Surprise gate + DPP thin | Episodic Learning | Store fewer, better episodes |
None of these require new architectural commitments. The algorithms exist. The modules exist. The triggers are straightforward.
And yet.
The wall behind the walls
Gate chunking on RL convergence. Consolidate episodes into semantic knowledge. Discriminate what’s worth remembering. Soar would be a more complete architecture. It would still need a human.
Laird says it plainly: “Without a human to closely guide it, Rosie is unable to get very far on its own, especially in being unable to learn new abstract symbolic concepts on its own” (§10, p.20). Rosie is Soar’s most capable agent. It learns sixty tasks from natural language instruction. It needs an instructor for every one.
This isn’t a failure of Rosie. It’s a pattern. TacAir-Soar’s 8,000 rules were written by engineers. PROPS takes declarative rule descriptions authored by people. Every successful Soar agent in the appendix has a human somewhere in the causal chain, providing knowledge that the architecture cannot generate from experience alone.
Soar treats this as a development-time dependency, something to be engineered away with the next learning mechanism. Laird knows better. His own assessment: “What I feel is most missing from Soar is its ability to ‘bootstrap’ itself up from the architecture and a set of innate knowledge into being a fully capable agent” (§10, p.20). Forty years of evidence supports the stronger reading: the human is load-bearing.
Complementation first
The research program has been: build the autonomous agent, then optionally add human interaction. Instructo-Soar and Rosie add a teacher, but as one application among many. The architecture doesn’t require it.
What if the order is reversed?
Step one: complementation. Human and agent, each filling the other’s gaps. The agent has speed, consistency, parallel rule firing, tireless processing; the human has judgment, novel category formation, the ability to say “this is what matters.” Neither is intelligent alone. Together they are.
Step two: bootstrap. Use that composite intelligence to study what it does. Compress its patterns. Gradually move capabilities from the human side to the agent side. Chunking already does this at the substate level: compile deliberation into direct rules. The same principle, applied at the architecture level: compile human guidance into autonomous capability.
Soar’s forty-year trajectory suggests that generating the knowledge to fill these cells requires more intelligence than any single agent has demonstrated. If the architecture can’t generate it alone, it has to borrow it. The only place it currently exists at the required level is people. Solving complementation first isn’t a retreat from autonomy. It’s the path to it.
Soar already has the mechanism. An impasse means “I don’t have enough knowledge.” A substate means “go get it.” What if the default resolution of certain impasses was “ask the human”? Not as a fallback. As a first-class architectural response.
Chunking would compile the answer. The agent never asks the same question twice. Over time, fewer impasses reach the human. The boundary moves.
The four prescriptions above are correct. They fill real gaps. But they’re step two. Step one is acknowledging that the human who has always been there, quietly making every Soar agent work, belongs in the architecture.
What flows back
The prescription draws from the parts bin. Soar gives back. Six algorithms from Soar’s architecture are now in the parts bin:
| Algorithm | Parts bin cell | What it does |
|---|---|---|
| RETE network | Cache × graph | Incremental pattern matching. Processes WME deltas, caches partial match state. Pays for change, not total knowledge. |
| Truth maintenance | Filter × graph | I-supported structures auto-retract when their creating rule unmatches. Data-driven, no explicit delete. |
| Staged preferences | Attend × flat | Reject-first, rank-survivors. Process prohibit/reject before better/worse/best. Exit early if rejection resolves it. |
| EBC/Chunking | Consolidate × tree | Backtrace through dependency tree, identify necessary conditions, compile deliberation into production rule. |
| Delta-bar-delta | Consolidate × flat | Per-production adaptive learning rate. Each rule gets its own alpha, updated on every RL update. |
| Delta episodic storage | Remember × sequence | Store only changes between snapshots. Interval representation for persistent elements. |
Soar’s largest contribution is forty years of evidence for the ordering constraint. Every agent in the appendix is a data point: the right mechanisms, assembled by a human. Together they produced intelligent behavior that neither could produce alone. That’s complementation. It was always complementation. The architecture just didn’t have a name for it.
Based on Laird (2022), “Introduction to the Soar Cognitive Architecture”; Casteigts et al. (2019), “Computing Parameters of Sequence-Based Dynamic Graphs”; Rasmussen (2025), “Zep: A Temporal Knowledge Graph Architecture”; the parts bin; and the Natural Framework. Written via the double loop.