Diagnosis: Soar
Part of the cognition series. See also: SOAP Notes: Soar and Prescription: Soar.
Soar is among the most ambitious artifacts in computer science. Where most AI research optimizes a single capability, John Laird and his collaborators spent forty years building the whole mind, taking Allen Newell’s challenge literally (Laird, 2022, §intro). Every module earns its place, every mechanism connects through a single central hub, and the decision cycle stages parallel rule firing into sequential action. Soar works.
The diagnosis is based on Laird’s 2022 introduction, the Gentle Introduction (Lehman, Laird, & Rosenbloom, 2006), Derbinsky & Laird (2013) on forgetting in Soar, the open-source implementation and its manual, and correspondence with Laird.
Observations
Soar is a set of interacting task-independent modules, as opposed to a single pipeline (§1, p.2). Figure 1 of Laird (2022) shows the structure: five memories (Procedural, Semantic, Episodic, Symbolic Working Memory, Perceptual LT Memory), a Preference Memory buffer between elaboration and the decision procedure, four learning modules (Chunking, RL, Semantic Learning, Episodic Learning), the Decision Procedure, the Spatial-Visual System, and Embodiment (Perception, Motor).
Each module has its own knowledge representation, retrieval mechanism, and learning method (Figure 6, §9.2, p.17). The decision cycle is the top-level pipeline that orchestrates them. Diagnosing Soar means diagnosing each module individually.
What Soar gets right
The impasse mechanism is a work of engineering. When knowledge is insufficient to select or apply an operator, Soar creates a substate and reasons about the gap (§3, p.7). The same decision cycle runs recursively in the substate, with full access to all reasoning and memory capabilities. This single mechanism unifies planning, hierarchical task decomposition, metacognition, and deliberate operator evaluation (§3.3, p.9). Most architectures bolt these on. Soar derives them.
The decision cycle stages elaboration before selection. Elaboration rules fire in causally dependent waves: situation elaboration, then operator proposal, then operator evaluation (§2.2, p.5). Evaluation rules create reject, better/worse, best/worst, and numeric preferences that determine which proposed operator should be selected (§2.2.3, p.6). The decision procedure processes these in a fixed eight-step sequence: require, collect acceptable, prohibit, reject, then better/worse, best, worst, indifferent. Rejection before ranking, confirmed in run_preference_semantics().
The architecture arrived at this answer through forty years of building agents that had to work. Soar started in 1983 as a problem-solving architecture. The early agents exposed what was missing: no way to learn from deliberation (chunking was added), no way to handle uncertainty in selection (RL was added in 2005), no way to remember facts or experiences (semantic and episodic memory were added in 2006–2008), no way to reason about space (SVS was added). Each addition came from running into a wall while building a real agent, then extending the architecture to get past it.
Chunking is the cleanest learning mechanism in any cognitive architecture. It backtraces through the dependency chain, identifies which superstate conditions were necessary, and writes a production rule that fires directly next time (§4, p.9–10). Deliberation compiles into reaction. EBBS ensures the learned rule is correct relative to the substate reasoning and as general as possible without being over-general (§4, p.10). Among cognitive architectures, this is the most principled compiler.
The combinations are unique. Soar is the only architecture where (§9.3, p.17):
- RL learns retrievals from episodic memory (Gorski & Laird, 2011)
- Mental imagery simulates actions to detect collisions that inform RL (Wintermute, 2010)
- Chunking compiles planning into evaluation rules, then RL tunes the initial values (Laird, 2011)
- Episodic memory, metareasoning, and chunking combine for one-shot learning of operator evaluation knowledge (Mohan, 2015)
These emerge from the architecture. The decision cycle, working memory, and the impasse mechanism make them composable.
Real-time with millions of knowledge elements. Soar achieves a decision cycle of ~50ms even with millions of rules, facts, and episodes (§10, item 3, p.18). The RETE network’s incremental matching and episodic memory’s delta-based storage keep costs proportional to change rather than total knowledge. In the source, add_wme_to_rete() and remove_wme_from_rete() propagate individual WME changes through only the affected beta nodes rather than re-matching the entire rule base.
Demonstrated in real systems. Over the years, Soar agents have been embodied in real-world robots, computer games, and large-scale distributed simulation environments (§intro, p.1). These include:
- Rosie: learns new tasks from real-time natural language instruction, acquires task structures interactively, the most capable demonstration of Soar’s learning integration (§10, item 2, p.18; Lindes, 2022)
- Real-world robots: over 20 Soar-controlled robots with real-time decision-making, planning, and spatial reasoning via SVS (§10, item 4, p.18)
- Large-scale military simulations: agents incorporating real-time decision-making, planning, natural language understanding, metacognition, theory of mind, and mental imagery (§intro, p.1; Stearns, 2021)
- Human behavior modeling: detailed cognitive models that predict human performance (Schatz et al., 2022)
- Some agents have run uninterrupted for 30 days (§10, item 9, p.19)
Laird rates Soar on 15 capabilities derived from Newell (1990) and later extensions: 8 as “yes,” 5 as “partial,” and 2 as “no” (§10, p.17–20). These are demonstrated in deployed agents across domains.
The Decision Cycle (top-level pipeline)
All five forward phases functional. Elaboration rules compute abstractions and propose operators (§2.2.1–2.2.2). Evaluation rules create reject, better/worse, best/worst, and numeric preferences (§2.2.3). The fixed decision procedure processes rejects before ranking, then chooses a single operator or declares an impasse (§2.3). Soar supports Q-learning, SARSA, and eligibility traces for numeric preferences (§5, fn.5).
The elaboration phase fires rules in parallel waves: “a common progression that starts with a wave of elaboration rule firings, followed by a wave of operator proposal, and finally a wave of operator evaluation” (§2.2, p.5). Evaluation can’t fire until proposals exist. Laird confirmed in correspondence that “the results of this overall phase would be exactly the same if the roles were split and run sequentially.” Causally dependent rather than explicitly sequenced.
Symbolic Working Memory (memory) ✅
Working memory “maintains an agent’s situational awareness, including perceptual input, intermediate reasoning results, active goals, hypothetical states, and buffers” (§1, p.2). Justification-based truth maintenance (§2.2, p.5) provides automatic retraction: I-supported structures retract when their creating rule no longer matches. Working memory doesn’t rank and doesn’t learn. That’s by design.
Procedural Memory (memory) ✅
The only store with automatic learning. The RETE processes only changes to working memory: “rules fire only once for a specific match to data in working memory (this is called an instantiation)” (§2.2, p.5). No selection among rules: all matched instantiations fire in parallel. Rules don’t compete. Operators do. Chunking (§4) and RL (§5) both write to this store.
Semantic Memory (memory)
Five of six phases functional. Retrieval uses “a combination of base-level activation and spreading activation to determine the best match, as used originally in ACT-R” (§6, p.12). Base-level activation biases by recency and frequency; spreading activation biases toward concepts linked to currently active working memory structures. But: “Soar does not have an automatic learning mechanism for semantic memory, but an agent can deliberately store information at any time” (§6, p.13). The store grows only by hand or preloading (WordNet, DBpedia). The implementation has activation-based ranking for retrieval but no eviction. The word “forget” does not appear anywhere in the semantic memory source. Nothing is ever removed.
Episodic Memory (memory)
Five of six phases functional. “A new episode is automatically stored at the end of each decision” (§7, p.13). “Soar minimizes the memory overhead of episodic memory by storing only the changes between episodes” (§7, p.13). In the source,
epmem_new_episode()processes onlyepmem_wme_addsandepmem_node_removals; unchanged WMEs stay in_nowtables without re-storage. But “memory does grow over time, and the cost to retrieve old episodes slowly increases as the number of episodes grows” (§7, p.13). Episodic learning “does not have generalization mechanisms” (§7, p.13). The implementation confirms it: “The current episodic memory implementation does not implement any episodic store dynamics, such as forgetting.” No max-episode count, no eviction policy, no capacity bound. The source declares removal structures (epmem_id_removal_map) but never populates them. The store is append-only. At ~50ms per decision cycle, that’s 72,000 episodes per hour into an unbounded SQLite store.
Spatial-Visual System (memory)
Five forward phases functional. “An agent uses operators to issue commands to SVS that create filters” that “automatically extract symbolic properties” (§8, p.14). Top-down control of what gets symbolized. “SVS supports hypothetical reasoning…through the ability to project non-symbolic structures into SVS” (§8, p.14). Laird confirmed in correspondence: “There is filtering at this phase as well.” The image memory system is “still experimental” (§9.2, Figure 6).
Chunking (learning)
Five of six phases functional. Chunking “compiles the processing in a substate into rules that create the substate results” (§4, p.9). It “back-traces through the rule that created them” (§4, p.10) to find superstate conditions. EBBS (“explanation-based behavior summarization”) (§4, p.10) — ensures chunks are correct and as general as possible. But “chunking requires that substate decisions be deterministic…Therefore, chunking is not used when decisions are made using numeric preferences” (§4, p.10). In the source, numeric indifferent preferences are marked with
NOTHING_DECIDER_FLAG, preventing them from entering the OSK structures that feed chunking.
Laird has the right plan: “We have plans to modify chunking so that such chunks are added to procedural memory when there is sufficient accumulated experience to ensure that they have a high probability of being correct” (§4, p.10). Gate chunking on RL convergence. The implementation doesn’t exist yet.
Reinforcement Learning (learning)
Five of six phases functional. “RL modifies selection knowledge so that an agent’s operator selections maximize future reward” (§5, p.11). “RL in Soar applies to every active substate,” a natural fit for hierarchical RL (§5, p.12). Global learning rate and discount rate are “fixed at agent initialization” (§5, fn.5). In the source,
rl_perform_update()retrieveslearning_rate(default 0.3) anddiscount_rate(default 0.9) as global parameters. Innormal_decaymode, every rule gets the same alpha. Delta-bar-delta mode gives each production its ownrl_delta_bar_delta_betaandrl_delta_bar_delta_h, adapting per-rule learning rates. But delta-bar-delta must be explicitly enabled, and the meta-learning rate that drives it is itself a global parameter.
Semantic Learning (learning)
Not yet autonomous. Three of six phases are agent-directed rather than architectural. “An agent can deliberately store information at any time” (§6, p.13) but there’s no relevance gating, no prioritization, no self-update. It’s a raw
store()call. Laird himself rates semantic learning as still “missing” among “types of architectural learning” (§10, item 7, p.18).
Episodic Learning (learning)
Automatic but undiscriminating. “A new episode is automatically stored at the end of each decision” (§7, p.13). In the source,
epmem_go()is called during the output phase; with the default trigger modedc,epmem_consider_new_episodeunconditionally setsnew_memory = true: no gating, no discrimination. “An agent can further limit the costs of retrievals by explicitly controlling which aspects of the state are stored, usually ignoring frequently changing low-level sensory data” (§7, p.13). But no mechanism discriminates which episodes are worth keeping.
The forgetting asymmetry
Forward passes work. No learning module maintains itself.
| Stack | Type | Forward pass | Learning |
|---|---|---|---|
| Decision Cycle | Top-level | Functional | Partial |
| Symbolic Working Memory | Memory | Functional | Nil (expected) |
| Procedural Memory / RETE | Memory | Functional | Functional |
| Semantic Memory | Memory | Functional | Missing |
| Episodic Memory | Memory | Functional | Missing |
| SVS / Perceptual LTM | Memory | Functional | Partial |
| Chunking | Learning | Functional | Missing |
| Reinforcement Learning | Learning | Functional | Partial (delta-bar-delta) |
| Semantic Learning | Learning | Functional | Missing |
| Episodic Learning | Learning | Functional | Missing |
These gaps share a single root cause.
Derbinsky & Laird (2013) proved that forgetting is essential to Soar’s scaling. Without it, a robot exploring a building exceeded the 50ms decision-cycle threshold within an hour as working memory grew past 12,000 elements. With base-level activation forgetting, working memory stayed at ~2,000 elements and decision time stayed under budget. In Liar’s Dice, memory grew as a power law without forgetting, reaching 1,800MB after 40,000 games. With forgetting, it stabilized at ~400MB while maintaining task performance.
They built forgetting for working memory and procedural memory; they never built it for episodic or semantic memory.
The manual confirms it: “The current episodic memory implementation does not implement any episodic store dynamics, such as forgetting.” The source declares removal structures (epmem_id_removal_map) but never populates them. Semantic memory has no forgetting discussion at all. The working memory activation header has an explicit forgetting_choices { disabled, naive, bsearch, approx } enum; the semantic memory module has nothing comparable. Both long-term stores grow without bound.
This asymmetry is the root cause.
The dominoes
1. Perception stays narrow. R4 dictates that the mechanism only removes elements from working memory that augment objects in semantic memory (Derbinsky & Laird, 2013, §5). You can only safely forget what you can reconstruct. But semantic memory has no automatic learning (§6, p.13). It grows only by hand or preloading.
R4 restricts the drain rather than perception. New WMEs enter working memory freely. The constraint is on what leaves: anything perceived outside the semantic vocabulary can’t be forgotten, so it accumulates. The RETE scales linearly with WM size (Derbinsky & Laird, 2013, §3). Growing WM pushes decision time past the 50ms threshold. So agents compensate, “usually ignor[ing] frequently changing low-level sensory data” (§7, p.13) as a scaling necessity. The bottleneck is at the drain. A clogged drain forces you to close the valve.
Compare:
| System | Input | Working memory | Compression |
|---|---|---|---|
| Human | ~10 Mbit/s optic nerve, Koch et al. 2006 | ~200 bits 7±2 chunks in WM, Miller 1956 | 50,000 : 1 |
| Soar | pre-symbolized input-link, SVS filters | 2,000–12,000 WMEs grows without bound, D&L 2013 | ~1 : 1 |
The brain’s input bandwidth is enormous because its working memory is disciplined. Forgetting is aggressive at every level. Lateral inhibition in the retina compresses ~126 million photoreceptors into ~1.2 million optic nerve fibers before signals leave the eye (Barlow, 1961). Only 5–10% of synapses onto thalamic relay cells are retinal; the rest is cortical feedback and modulatory gating (Sherman & Guillery, 2002). Sleep consolidation rewrites cortical representations offline (Diekelmann & Born, 2010). Soar’s input is narrow because its working memory is not. The architecture pre-symbolizes input and throttles what enters. Real-time performance costs the agent its peripheral vision.
The ceiling is set by the input, and the input is set by the stores’ ability to forget.
2. Semantic memory can’t maintain itself. The working-memory forgetting policy assumes semantic memory is the backup: “forgotten working-memory knowledge may be recovered via deliberate reconstruction from semantic memory” (Derbinsky & Laird, 2013, §5). But the backup itself has no forgetting, no automatic learning, and no capacity bound. It grows only by hand or preloading (§6, p.13). The store that everything else depends on for recovery can’t shed and can’t grow.
This creates a catch-22. Adding eviction to semantic memory undermines the safety of the existing mechanism. Derbinsky & Laird flag the risk: “our forgetting policy does admit a class of reasoning errors wherein the contents of semantic memory are changed so as to be inconsistent with decayed WMEs” (2013, §5.2). Today this is a minor edge case because smem rarely changes. But with automatic learning and eviction, smem changes actively. Every smem deletion can orphan WMEs that were “safely” forgotten from working memory under R4. The WME is gone, its backup is gone, reconstruction fails silently.
Making the cold tier dynamic requires coordinating eviction across tiers, a cache coherence problem. The existing JTMS handles dependency-driven retraction within working memory; the missing wiring is back-invalidation across the tier boundary.
3. Chunking and RL can’t compose. “Chunking requires that substate decisions be deterministic…Therefore, chunking is not used when decisions are made using numeric preferences” (§4, p.10). That looks like a learning problem, two learning mechanisms that can’t talk to each other. It partly is. Regardless of perception: RL-updated rules “encode expected-utility information…and cannot be regenerated if the rule is removed” (Derbinsky & Laird, 2013, §6), while chunked rules can be regenerated from substates. Without composition, RL rules can’t be safely forgotten, and chunked rules can’t incorporate reward. That’s a learning problem.
But there’s a second, domain-dependent effect. RL converges when it covers enough of the state space. In small, fixed domains — Liar’s Dice, structured games — RL converges fine with throttled perception; Derbinsky & Laird’s agents reached 75–80% win rates within 10K games (2013, §6.1). In open-ended domains where the state space grows with perceptual scope, throttled perception starves RL of the diversity it needs; decisions stay stochastic longer than necessary; chunking stays locked out. Laird’s plan, gate chunking on RL convergence (§4, p.10), is the right mechanism. In fixed domains, convergence happens and the gate opens. In open-ended domains, convergence requires throughput, and throughput is capped by domino 1.
4. Episodic memory grows without bound. 72,000 episodes per hour, no eviction, no discrimination, retrieval cost linear in store size. The standard mitigation is to throttle what gets recorded, which circles back to domino 1.
5. The agent can’t bootstrap. Laird (§10, p.20): “What I feel is most missing from Soar is its ability to ‘bootstrap’ itself up from the architecture and a set of innate knowledge into being a fully capable agent across a breadth of tasks.” This is the same class of problem as every other domino: cache invalidation.
When chunking compiles a substate into a production rule, the chunk claims its place in procedural memory and stays. Procedural forgetting uses base-level activation, but activation tracks invocation rather than efficacy. The working memory activation module implements BLA decay: elements are forgotten when activation drops below threshold, and activation is a function of access timestamps, not outcomes. There is no field for whether a firing led to reward or goal achievement. A chunk that fires every cycle on stale pattern matches stays warm; a chunk that would be useful in a novel situation but hasn’t been triggered decays. This is a property of the BLA model itself, inherited from ACT-R (Anderson et al., 2004).
Soar’s chunking makes the consequence acute: chunks that match the old environment crowd the RETE. The agent solves old problems forever.
An efficacy-based activation signal would fix this, but it requires ground truth. RL rules have reward. Chunked rules have substate correctness. Elaboration rules have neither. There’s no general “was this firing useful?” signal for a production that proposes an operator that might not get selected. Until such a signal exists, the practical lever is eviction by composition: once RL-gated chunking compiles a converged rule, the RL rule that generated it becomes reconstructible and can be safely forgotten under the existing BLA policy. The practical signal is reconstructibility, the same principle behind R4. Prune what you can rebuild.
Rosie learns new tasks from instruction, but only tasks that fit the operator schemas a human pre-built. The perceptual vocabulary is fixed by what’s in semantic memory (domino 1), and the procedural vocabulary is fixed by chunks that never expire. Learning without forgetting is maladaptation. The agent either resets manually or calcifies around its early experience.
Forty years of ambition built every module. The one operation that would connect the long-term stores, merging, is still missing. The full assessment and plan are in the SOAP note.
Appendix: Open questions
Questions this diagnosis cannot answer from the paper, manual, or source code alone.
-
The 30-day agents. At 72,000 episodes per hour, a 30-day run accumulates ~52 million episodes (§10, item 9, p.19). How was retrieval cost managed in practice? Was episodic memory disabled?
-
Dynamic world, static chunks. EBBS guarantees chunk correctness relative to the substate reasoning at compile time. In long-running agents with changing environments, have chunks been observed firing incorrectly after environmental change?
-
Partial evaluation. Chunking specializes general substate reasoning to known superstate conditions and produces a residual program (the chunk) that runs without interpretation. Futamura (1971) described the same operation for programming languages a decade before Soar. Has the connection been explored?
Diagnosis based on Laird (2022), “Introduction to the Soar Cognitive Architecture,” and Derbinsky & Laird (2013), “Effective and efficient forgetting of learned knowledge.” All section references cite the 2022 paper unless noted. Written via the double loop.