Prescription: Soar

Part of the cognition series. Follows the Diagnosis and SOAP Notes.

The diagnosis found four constraints. Each has a known relief. This post provides the algorithmic detail, pulling from the parts bin. The first three are things to build; the last section describes what opens up when they’re in place.

Episodic-to-semantic merge

The primary operation. Read recurring patterns from episodes and write them as semantic generalizations. Merged semantic entries replace the episodes they were extracted from. The merged entry is the knowledge. A demonstration PR implements merge and eviction.

BeforeAfter
SMEMGrows only by agent command or preloadingGrows automatically from experience
EPMEMEpisodes accumulate without boundRegularities extracted, episodes compacted
TransferAgent re-learns spatial layouts per taskRecurring layouts written to SMEM, available across tasks
RosieNeeds human guidance for every new domainGeneralizes from prior task episodes autonomously
ContractSemantic memory reflects what the agent has experienced, beyond what it was told.

Sequences of graph snapshots

EPMEM stores episodes as change-deltas indexed by decision cycle number (§7, p.13). That’s a sequence of graph snapshots: each episode is a set of WME changes, and each WME is a node in an attribute-value graph.

SMEM stores graph structures in SQLite (§6, p.12). The target is a graph database rather than a flat one. The architecture diagram (Laird, 2022, Fig. 1) shows smem as a tree, and the implementation stores parent-child relationships. Merging must produce hierarchy at multiple scales instead of a single flat target.

The parts bin maps this to two cells:

StageSequenceGraph
ConsolidateTemporal composition
Casteigts et al. 2019
Graph coarsening
survey, 2024

Soar’s episodes are sequences of graphs, so both operations apply: coarsening operates within each snapshot, composition operates across them. CLARION’s Rule-Extraction-Refinement does something analogous at the implicit→explicit boundary (Sun, 2016). No cognitive architecture does it at the episodic→semantic boundary.

Graph coarsening

Graph coarsening shrinks each snapshot; temporal composition finds stable structures across windows

Collapse co-occurring nodes within a single snapshot into supernodes while preserving structural properties (graph reduction survey, 2024). Each episode’s WME graph gets smaller: attribute-value pairs that always co-occur merge into a single supernode. The coarsened snapshot is what enters the temporal composition window.

Composition + test

Temporal composition: older snapshots fade, compose+test extracts stable structure

Casteigts et al. (2019) model temporal data as a sequence of static graphs G₁, G₂, …, Gδ. Two abstract operations:

Any parameter computable through compose + test runs in O(δ) operations, which is optimal (Casteigts et al., 2019). The framework is generic: swap the compose and test functions, compute a different regularity.

Compose at different δ windows to build a tree of abstractions: short δ yields low-level entries, medium δ yields structural patterns, long δ yields domain heuristics. This matches smem’s existing tree structure and Zep’s three-tier hierarchy (episodic → semantic → community subgraphs).

For Soar:

OperationSoar implementation
ComposeUnion of WME structures across episodes in a window. Co-occurring attribute-value pairs accumulate counts.
TestDoes the composed structure meet a frequency threshold? Has this operator sequence appeared in ≥ k episodes?
WriteCreate new SMEM graph structure encoding the regularity. Set initial base-level activation from frequency count.
TriggerGoal completion, idle time, or episode accumulation threshold. The architectural equivalent of sleep.

Zep (Rasmussen, 2025) implements a similar episodic→semantic pipeline for LLM agents, with community detection as a third tier. Soar’s SMEM already has the graph database and activation-based retrieval; the missing piece is the consolidation loop.

Episodic eviction

Episodes whose regularities have been merged into semantic memory are redundant. The semantic entry already contains what mattered, so eviction incurs no reconstruction debt.

BeforeAfter
StorageEvery decision cycle → episodeOnly novel/important cycles stored at full fidelity
Volume72,000 episodes/hour, unboundedFraction stored, bounded by importance threshold
RetrievalOld episode cost grows with total countSmaller store, faster retrieval
QualityRoutine episodes dilute the storeDPP thinning ensures stored episodes are diverse
ContractRetrieval cost stays proportional to important episodes, instead of total decision cycles.

Eviction alone trades match cost for reconstruction cost. Derbinsky & Laird’s own data (2013, Fig. 4) shows reconstruction latency breaking the 50ms threshold even at d=0.5, because reconstruction from episodic memory scales with working-memory size at encoding time (2013, §3). Multi-tier eviction compounds the problem: WM reconstructs from smem, smem from epmem, each hop consuming budget. Merging avoids the chain entirely. Compress N episodes into one semantic entry, and the sources become redundant.

Write-time discrimination

Not all episodes are worth storing. The simplest approach is surprise-based gating: store an episode when its content deviates significantly from predictions. Isele & Cosgun (AAAI 2018) identified four criteria for selective experience replay: surprise (high TD error), reward, distribution matching, and coverage maximization.

SignalImplementation
RelevanceReward proximity (positive or negative), impasse resolution, operator application that changed the goal stack.
SimilarityWME overlap with recent episodes. High overlap = routine. Low overlap = novel.
GateEpisodes with high relevance and low similarity to recent episodes are stored at full fidelity. Routine episodes are stored at reduced fidelity or skipped.
Episodic discrimination: decision cycle → importance score → DPP gate → novel episodes stored at full fidelity, routine episodes skipped

For agents that encounter many similar surprising episodes (e.g., repeated failures at the same task), a periodic DPP-based thinning pass ensures the stored set stays diverse. This separates the hot path (surprise gate, cheap, per-cycle) from the batch operation (DPP thinning, periodic, ensures diversity). Neuroscience supports this two-phase model: synaptic tagging marks salient episodes at encoding time; sharp-wave ripples during sleep selectively consolidate them (Science, 2024).

Fewer episodes, faster consolidation

Eviction and consolidation reinforce each other. Discrimination reduces the volume; consolidation reads the survivors and compresses them into semantic knowledge.

Semantic maintenance with back-invalidation

Merged semantic entries should decay if they stop matching incoming episodes. Base-level activation handles the common case. The hard case is cross-tier coherence.

BeforeAfter
SMEMHand-built, static, no evictionGrows from experience, decays by activation, maintains provenance
CoherenceR4 assumes smem is stableBack-invalidation propagates eviction across tiers
ProvenanceNone: smem entries are opaqueEvery generalization traces to its source episodes
ContractSemantic memory is self-maintaining: grows from experience, decays when experience contradicts it, and never orphans dependent structures silently.
Three-tier back-invalidation: WM forgets under R4, smem entry evicted, orphaned WME flagged

The R4 coherence problem

Derbinsky & Laird’s working-memory forgetting policy has a requirement that makes cross-tier coupling explicit: R4 dictates that the mechanism only removes elements from working memory that augment objects in semantic memory (2013, §5). The rationale: you can only safely forget what you can reconstruct. Today this is benign because smem rarely changes. But with automatic learning and eviction, smem changes actively. Every smem deletion can orphan WMEs that were “safely” forgotten from working memory under R4. The WME is gone, its backup is gone, reconstruction fails silently.

Union-find forest over smem

A union-find forest over smem entries gives the structure needed for both provenance and coherence. Parent pointers trace each semantic entry back to its source episodes, and each episode forward to the semantic entry it contributed to. The pointers serve double duty:

The pointers are the cross-tier coherence mechanism. No separate dependency tracker required for the common case. Union-find’s parent pointers are plain integers, compatible with smem’s existing SQLite schema and with the JTMS dependency links already in working memory. No new pointer type is needed; the forest threads through the same identifier space. The union-find compaction experiment implements this structure for a different domain (embedding deduplication) and confirms that find, union, and expand compose without a separate index.

Back-invalidation protocol

The hard case: evicting a semantic entry breaks R4’s reconstruction guarantee for any WMEs forgotten under it (domino 2). Before evicting:

  1. Walk the reverse pointers to find WMEs forgotten under R4 that depend on this smem entry
  2. If any exist, either promote them back to working memory or accept the loss
  3. If the smem entry was the root of a union-find tree, propagate invalidation to child entries

This is a cache coherence problem. The existing JTMS mechanism already handles dependency-driven retraction within working memory; the missing wiring is back-invalidation across the tier boundary. Union-find pointers provide that wiring without a separate JTMS installation.

Structural redundancy via tree inclusion

BLA handles staleness — entries that haven’t been accessed decay. But an actively retrieved entry can still be redundant if a richer entry structurally contains it. After a merge produces a new smem entry, check whether existing entries are structurally dominated: entry B is redundant if B’s graph embeds into A’s via tree inclusion (Kilpeläinen & Mannila, 1995) — same nodes, same parent-child relationships, possibly with extra structure in A. The check is O(|B|·|A|) per pair. After merge, run the dominance check against the union-find cluster’s neighbors. Dominated entries are evicted through the back-invalidation protocol above.

The same check applies to procedural memory. A chunk compiled from a richer substate may structurally contain an older chunk’s condition tree. Tree inclusion on the condition DAGs identifies not just dead chunks (never fire) but redundant ones (fire but do nothing the dominating chunk doesn’t already do).

BLA and tree inclusion are orthogonal: BLA evicts the stale, tree inclusion evicts the subsumed. Both feed into back-invalidation.

RL-gated chunking

Chunking requires deterministic substate results, but RL uses stochastic selection. The two cannot compose (§4, p.10). Laird’s planned fix: gate chunking on RL convergence. A demonstration PR exists.

BeforeAfter
ChunkingDisabled when RL drives the decisionFires once RL preferences converge
RLNumeric preferences tuned but never compiledStable policies compiled to production rules
SpeedRL computation every decision cycleConverged decisions fire as direct rule matches
ForgettingRL rules can't be safely forgottenCompiled chunks are reconstructible; RL rules become forgettable
ContractAny stable decision policy, whether discovered by deliberation or by RL, eventually compiles into a direct rule.

Posterior convergence gate

Track the exponential moving average of Q-value changes for each RL rule. When the EMA drops below a threshold, the preference has stabilized. This requires no architectural modification to Soar’s RL, just a monitoring wrapper.

The stochasticity requirement derives from Landauer’s principle: deterministic selection can’t explore, so any system with a real Attend stage must inject noise. But Consolidate needs deterministic input to compile. The tension is physical rather than architectural:

Natural FrameworkSoar
AttendMust be stochastic (derived)RL exploration is stochastic (observed)
ConsolidateReads from Remember, writes policyChunking reads substates, writes rules
The gapPhase transition requiredGate required

The convergence gate is the phase transition. Attend starts stochastic, EMA tracks stability, and when it crosses threshold the decision becomes deterministic. Consolidate can finally read it.

ACT-R has the same uncoordinated interaction between utility learning and production compilation. Any architecture with a real Attend and a real Consolidate will hit this wall. The gate is the generic fix.

StepSoar implementation
TrackFor each RL rule, maintain an EMA of |ΔQ| across recent updates.
GateWhen EMA drops below threshold τ, the preference has converged. Flag the decision as deterministic.
CompileChunking fires on the flagged substate. The resulting chunk encodes the converged policy as a production rule.
ContinueRL keeps running on unconverged decisions. Chunking picks off the stable ones as they settle.
RL-gated chunking: RL rules → TD updates → variance check → if converged, chunking fires → new production rule

Pruning dead chunks

Chunks accumulate. Soar already has base-level activation forgetting for procedural memory: rules that never fire lose activation and are eventually forgotten. This handles dead chunks. But BLA can’t handle chunks that are actively wrong. A chunk compiled from stale knowledge fires confidently in a changed world, keeping its activation high.

Two detection mechanisms:

SignalDetectionResponse
Trauma recurrencecount(similar_failures) > 1 — same impasse recurs despite a chunk that should prevent it.Retract chunk, re-enter substate, rechunk.
Reward driftEMA of |ΔQ| spikes after stability. ADWIN: variable-length window, bounded false-positive rate.Flag chunk, re-enter substate, rechunk.

The gate that says “stable enough to compile” is the same gate that says “no longer stable, recompile.” Composition without pruning is another append-only store.

What opens up

The three reliefs above build the drain. Everything below follows as a side effect.

Novel composition. With bounded stores, newly perceived memories compose with old ones. A spatial layout merged from 50 episodes combines with a fresh navigation failure to synthesize an operator proposal that neither would produce alone. Merged semantic entries participate alongside fresh percepts in the elaboration phase. No new mechanism is needed. The elaboration phase, the impasse mechanism, and chunking already do this. They just need a bounded cache to work with.

Wider perception. Derbinsky & Laird’s robot throttled perception because the drain didn’t work (domino 1). R4 restricted what could leave working memory to what was backed up in semantic memory. With automatic semantic learning, R4’s scope expands. Newly perceived categories get merged into smem, making their WMEs forgettable. The drain opens, so the valve can open. The architecture no longer needs to pre-symbolize everything or “usually ignor[e] frequently changing low-level sensory data” (§7, p.13).

Filtering between elaboration and selection. A testable hypothesis rather than a known algorithm. Soar’s forward pass treats elaboration and selection as causally dependent phases with no explicit filter between them (§2.2, p.5). A novelty gate (a frequency counter that passes novel WMEs and suppresses redundant ones) could change the architecture’s effective Perceive throughput. The parts bin has the components (change-point detection, truth maintenance); their composition within Soar’s decision cycle is uncharted.

Giving back

The prescription draws from the parts bin. Soar gives back. Six algorithms from Soar’s architecture are now in the parts bin:

AlgorithmParts bin cellWhat it does
RETE networkCache × graphIncremental pattern matching. Processes WME deltas, caches partial match state. Pays for change instead of total knowledge.
Truth maintenanceFilter × graphI-supported structures auto-retract when their creating rule unmatches. Data-driven, no explicit delete.
Staged preferencesAttend × flatReject-first, rank-survivors via run_preference_semantics(). Process prohibit/reject before better/worse/best.
EBC/ChunkingConsolidate × treeBacktrace through dependency tree, identify necessary conditions, compile deliberation into production rule.
Delta-bar-deltaConsolidate × flatPer-production adaptive learning rate. Each rule gets its own alpha, updated on every RL update.
Delta episodic storageRemember × sequenceStore only changes between snapshots. Interval representation for persistent elements.

Based on Laird (2022), “Introduction to the Soar Cognitive Architecture”; Derbinsky & Laird (2013), “Effective and efficient forgetting of learned knowledge”; Casteigts et al. (2019), “Computing Parameters of Sequence-Based Dynamic Graphs”; Rasmussen (2025), “Zep: A Temporal Knowledge Graph Architecture”; the parts bin; the union-find compaction experiment; and the Natural Framework. Written via the double loop.