Prescription: Soar

Part of the cognition series. Follows the Diagnosis and SOAP Notes.

The diagnosis found four constraints. Each has a known relief. This post provides the algorithmic detail, pulling from the parts bin. The first three are things to build; the last section describes what opens up when they’re in place.

Episodic-to-semantic merge

The primary operation. Read recurring patterns from episodes and write them as semantic generalizations. Merged semantic entries replace the episodes they were extracted from. The merged entry is the knowledge. A demonstration PR implements merge and eviction.

	Before	After
SMEM	Grows only by agent command or preloading	Grows automatically from experience
EPMEM	Episodes accumulate without bound	Regularities extracted, episodes compacted
Transfer	Agent re-learns spatial layouts per task	Recurring layouts written to SMEM, available across tasks
Rosie	Needs human guidance for every new domain	Generalizes from prior task episodes autonomously
Contract	Semantic memory reflects what the agent has experienced, beyond what it was told.

Sequences of graph snapshots

EPMEM stores episodes as change-deltas indexed by decision cycle number (§7, p.13). That’s a sequence of graph snapshots: each episode is a set of WME changes, and each WME is a node in an attribute-value graph.

SMEM stores graph structures in SQLite (§6, p.12). The target is a graph database rather than a flat one. The architecture diagram (Laird, 2022, Fig. 1) shows smem as a tree, and the implementation stores parent-child relationships. Merging must produce hierarchy at multiple scales instead of a single flat target.

The parts bin maps this to two cells:

Stage	Sequence	Graph
Consolidate	Temporal composition Casteigts et al. 2019	Graph coarsening survey, 2024

Soar’s episodes are sequences of graphs, so both operations apply: coarsening operates within each snapshot, composition operates across them. CLARION’s Rule-Extraction-Refinement does something analogous at the implicit→explicit boundary (Sun, 2016). No cognitive architecture does it at the episodic→semantic boundary.

Graph coarsening

Collapse co-occurring nodes within a single snapshot into supernodes while preserving structural properties (graph reduction survey, 2024). Each episode’s WME graph gets smaller: attribute-value pairs that always co-occur merge into a single supernode. The coarsened snapshot is what enters the temporal composition window.

Composition + test

Temporal composition: older snapshots fade, compose+test extracts stable structure

Casteigts et al. (2019) model temporal data as a sequence of static graphs G₁, G₂, …, Gδ. Two abstract operations:

Compose: combine snapshots over an interval (union of structures, transitive closure of co-occurrence)
Test: check whether the composed result satisfies a property (recurring structure, stable feature, frequent operator sequence)

Any parameter computable through compose + test runs in O(δ) operations, which is optimal (Casteigts et al., 2019). The framework is generic: swap the compose and test functions, compute a different regularity.

Compose at different δ windows to build a tree of abstractions: short δ yields low-level entries, medium δ yields structural patterns, long δ yields domain heuristics. This matches smem’s existing tree structure and Zep’s three-tier hierarchy (episodic → semantic → community subgraphs).

For Soar:

Operation	Soar implementation
Compose	Union of WME structures across episodes in a window. Co-occurring attribute-value pairs accumulate counts.
Test	Does the composed structure meet a frequency threshold? Has this operator sequence appeared in ≥ k episodes?
Write	Create new SMEM graph structure encoding the regularity. Set initial base-level activation from frequency count.
Trigger	Goal completion, idle time, or episode accumulation threshold. The architectural equivalent of sleep.

Zep (Rasmussen, 2025) implements a similar episodic→semantic pipeline for LLM agents, with community detection as a third tier. Soar’s SMEM already has the graph database and activation-based retrieval; the missing piece is the consolidation loop.

Episodic eviction

Episodes whose regularities have been merged into semantic memory are redundant. The semantic entry already contains what mattered, so eviction incurs no reconstruction debt.

	Before	After
Storage	Every decision cycle → episode	Only novel/important cycles stored at full fidelity
Volume	72,000 episodes/hour, unbounded	Fraction stored, bounded by importance threshold
Retrieval	Old episode cost grows with total count	Smaller store, faster retrieval
Quality	Routine episodes dilute the store	DPP thinning ensures stored episodes are diverse
Contract	Retrieval cost stays proportional to important episodes, instead of total decision cycles.

Eviction alone trades match cost for reconstruction cost. Derbinsky & Laird’s own data (2013, Fig. 4) shows reconstruction latency breaking the 50ms threshold even at d=0.5, because reconstruction from episodic memory scales with working-memory size at encoding time (2013, §3). Multi-tier eviction compounds the problem: WM reconstructs from smem, smem from epmem, each hop consuming budget. Merging avoids the chain entirely. Compress N episodes into one semantic entry, and the sources become redundant.

Write-time discrimination

Not all episodes are worth storing. The simplest approach is surprise-based gating: store an episode when its content deviates significantly from predictions. Isele & Cosgun (AAAI 2018) identified four criteria for selective experience replay: surprise (high TD error), reward, distribution matching, and coverage maximization.

Signal	Implementation
Relevance	Reward proximity (positive or negative), impasse resolution, operator application that changed the goal stack.
Similarity	WME overlap with recent episodes. High overlap = routine. Low overlap = novel.
Gate	Episodes with high relevance and low similarity to recent episodes are stored at full fidelity. Routine episodes are stored at reduced fidelity or skipped.

Episodic discrimination: decision cycle → importance score → DPP gate → novel episodes stored at full fidelity, routine episodes skipped

For agents that encounter many similar surprising episodes (e.g., repeated failures at the same task), a periodic DPP-based thinning pass ensures the stored set stays diverse. This separates the hot path (surprise gate, cheap, per-cycle) from the batch operation (DPP thinning, periodic, ensures diversity). Neuroscience supports this two-phase model: synaptic tagging marks salient episodes at encoding time; sharp-wave ripples during sleep selectively consolidate them (Science, 2024).

Fewer episodes, faster consolidation

Eviction and consolidation reinforce each other. Discrimination reduces the volume; consolidation reads the survivors and compresses them into semantic knowledge.

Semantic maintenance with back-invalidation

Merged semantic entries should decay if they stop matching incoming episodes. Base-level activation handles the common case. The hard case is cross-tier coherence.

	Before	After
SMEM	Hand-built, static, no eviction	Grows from experience, decays by activation, maintains provenance
Coherence	R4 assumes smem is stable	Back-invalidation propagates eviction across tiers
Provenance	None: smem entries are opaque	Every generalization traces to its source episodes
Contract	Semantic memory is self-maintaining: grows from experience, decays when experience contradicts it, and never orphans dependent structures silently.

Three-tier back-invalidation: WM forgets under R4, smem entry evicted, orphaned WME flagged

The R4 coherence problem

Derbinsky & Laird’s working-memory forgetting policy has a requirement that makes cross-tier coupling explicit: R4 dictates that the mechanism only removes elements from working memory that augment objects in semantic memory (2013, §5). The rationale: you can only safely forget what you can reconstruct. Today this is benign because smem rarely changes. But with automatic learning and eviction, smem changes actively. Every smem deletion can orphan WMEs that were “safely” forgotten from working memory under R4. The WME is gone, its backup is gone, reconstruction fails silently.

Union-find forest over smem

A union-find forest over smem entries gives the structure needed for both provenance and coherence. Parent pointers trace each semantic entry back to its source episodes, and each episode forward to the semantic entry it contributed to. The pointers serve double duty:

find: “where did this generalization come from?” (provenance)
Reverse links: “which semantic entries depend on this episode?” (coherence)
union: links a new episode to the matching cluster and updates the centroid
expand: reinflates a merged entry from sources if the merge turns out wrong (recoverability)

The pointers are the cross-tier coherence mechanism. No separate dependency tracker required for the common case. Union-find’s parent pointers are plain integers, compatible with smem’s existing SQLite schema and with the JTMS dependency links already in working memory. No new pointer type is needed; the forest threads through the same identifier space. The union-find compaction experiment implements this structure for a different domain (embedding deduplication) and confirms that find, union, and expand compose without a separate index.

Back-invalidation protocol

The hard case: evicting a semantic entry breaks R4’s reconstruction guarantee for any WMEs forgotten under it (domino 2). Before evicting:

Walk the reverse pointers to find WMEs forgotten under R4 that depend on this smem entry
If any exist, either promote them back to working memory or accept the loss
If the smem entry was the root of a union-find tree, propagate invalidation to child entries

This is a cache coherence problem. The existing JTMS mechanism already handles dependency-driven retraction within working memory; the missing wiring is back-invalidation across the tier boundary. Union-find pointers provide that wiring without a separate JTMS installation.

Structural redundancy via tree inclusion

BLA handles staleness — entries that haven’t been accessed decay. But an actively retrieved entry can still be redundant if a richer entry structurally contains it. After a merge produces a new smem entry, check whether existing entries are structurally dominated: entry B is redundant if B’s graph embeds into A’s via tree inclusion (Kilpeläinen & Mannila, 1995) — same nodes, same parent-child relationships, possibly with extra structure in A. The check is O(|B|·|A|) per pair. After merge, run the dominance check against the union-find cluster’s neighbors. Dominated entries are evicted through the back-invalidation protocol above.

The same check applies to procedural memory. A chunk compiled from a richer substate may structurally contain an older chunk’s condition tree. Tree inclusion on the condition DAGs identifies not just dead chunks (never fire) but redundant ones (fire but do nothing the dominating chunk doesn’t already do).

BLA and tree inclusion are orthogonal: BLA evicts the stale, tree inclusion evicts the subsumed. Both feed into back-invalidation.

RL-gated chunking

Chunking requires deterministic substate results, but RL uses stochastic selection. The two cannot compose (§4, p.10). Laird’s planned fix: gate chunking on RL convergence. A demonstration PR exists.

	Before	After
Chunking	Disabled when RL drives the decision	Fires once RL preferences converge
RL	Numeric preferences tuned but never compiled	Stable policies compiled to production rules
Speed	RL computation every decision cycle	Converged decisions fire as direct rule matches
Forgetting	RL rules can't be safely forgotten	Compiled chunks are reconstructible; RL rules become forgettable
Contract	Any stable decision policy, whether discovered by deliberation or by RL, eventually compiles into a direct rule.

Posterior convergence gate

Track the exponential moving average of Q-value changes for each RL rule. When the EMA drops below a threshold, the preference has stabilized. This requires no architectural modification to Soar’s RL, just a monitoring wrapper.

The stochasticity requirement derives from Landauer’s principle: deterministic selection can’t explore, so any system with a real Attend stage must inject noise. But Consolidate needs deterministic input to compile. The tension is physical rather than architectural:

	Natural Framework	Soar
Attend	Must be stochastic (derived)	RL exploration is stochastic (observed)
Consolidate	Reads from Transmit, writes policy	Chunking reads substates, writes rules
The gap	Phase transition required	Gate required

The convergence gate is the phase transition. Attend starts stochastic, EMA tracks stability, and when it crosses threshold the decision becomes deterministic. Consolidate can finally read it.

ACT-R has the same uncoordinated interaction between utility learning and production compilation. Any architecture with a real Attend and a real Consolidate will hit this wall. The gate is the generic fix.

Step	Soar implementation
Track	For each RL rule, maintain an EMA of \|ΔQ\| across recent updates.
Gate	When EMA drops below threshold τ, the preference has converged. Flag the decision as deterministic.
Compile	Chunking fires on the flagged substate. The resulting chunk encodes the converged policy as a production rule.
Continue	RL keeps running on unconverged decisions. Chunking picks off the stable ones as they settle.

RL-gated chunking: RL rules → TD updates → variance check → if converged, chunking fires → new production rule

Pruning dead chunks

Chunks accumulate. Soar already has base-level activation forgetting for procedural memory: rules that never fire lose activation and are eventually forgotten. This handles dead chunks. But BLA can’t handle chunks that are actively wrong. A chunk compiled from stale knowledge fires confidently in a changed world, keeping its activation high.

Two detection mechanisms:

Signal	Detection	Response
Trauma recurrence	`count(similar_failures) > 1` — same impasse recurs despite a chunk that should prevent it.	Retract chunk, re-enter substate, rechunk.
Reward drift	EMA of \|ΔQ\| spikes after stability. ADWIN: variable-length window, bounded false-positive rate.	Flag chunk, re-enter substate, rechunk.

The gate that says “stable enough to compile” is the same gate that says “no longer stable, recompile.” Composition without pruning is another append-only store.

What opens up

The three reliefs above build the drain. Everything below follows as a side effect.

Novel composition. With bounded stores, newly perceived memories compose with old ones. A spatial layout merged from 50 episodes combines with a fresh navigation failure to synthesize an operator proposal that neither would produce alone. Merged semantic entries participate alongside fresh percepts in the elaboration phase. No new mechanism is needed. The elaboration phase, the impasse mechanism, and chunking already do this. They just need a bounded cache to work with.

Wider perception. Derbinsky & Laird’s robot throttled perception because the drain didn’t work (domino 1). R4 restricted what could leave working memory to what was backed up in semantic memory. With automatic semantic learning, R4’s scope expands. Newly perceived categories get merged into smem, making their WMEs forgettable. The drain opens, so the valve can open. The architecture no longer needs to pre-symbolize everything or “usually ignor[e] frequently changing low-level sensory data” (§7, p.13).

Filtering between elaboration and selection. A testable hypothesis rather than a known algorithm. Soar’s forward pass treats elaboration and selection as causally dependent phases with no explicit filter between them (§2.2, p.5). A novelty gate (a frequency counter that passes novel WMEs and suppresses redundant ones) could change the architecture’s effective Perceive throughput. The parts bin has the components (change-point detection, truth maintenance); their composition within Soar’s decision cycle is uncharted.

Giving back

The prescription draws from the parts bin. Soar gives back. Six algorithms from Soar’s architecture are now in the parts bin:

Algorithm	Parts bin cell	What it does
RETE network	Cache × graph	Incremental pattern matching. Processes WME deltas, caches partial match state. Pays for change instead of total knowledge.
Truth maintenance	Filter × graph	I-supported structures auto-retract when their creating rule unmatches. Data-driven, no explicit delete.
Staged preferences	Attend × flat	Reject-first, rank-survivors via `run_preference_semantics()`. Process prohibit/reject before better/worse/best.
EBC/Chunking	Consolidate × tree	Backtrace through dependency tree, identify necessary conditions, compile deliberation into production rule.
Delta-bar-delta	Consolidate × flat	Per-production adaptive learning rate. Each rule gets its own alpha, updated on every RL update.
Delta episodic storage	Transmit × sequence	Store only changes between snapshots. Interval representation for persistent elements.

Based on Laird (2022), “Introduction to the Soar Cognitive Architecture”; Derbinsky & Laird (2013), “Effective and efficient forgetting of learned knowledge”; Casteigts et al. (2019), “Computing Parameters of Sequence-Based Dynamic Graphs”; Rasmussen (2025), “Zep: A Temporal Knowledge Graph Architecture”; the parts bin; the union-find compaction experiment; and the Natural Framework. Written via the double loop.