Abduction

Part of the cognition series. Sequel to Modes of Reason.

A mechanic taps the alternator and the engine stalls. Two hypotheses fire: test the battery, test the voltage regulator. Nobody taught the mechanic a hypothesis-generation algorithm. The shape of the failure named the next experiment.

If no mechanism for generating deliberate follow-up hypotheses exists, every intuition would arrive at random, in arbitrary order. But science’s knowledge graph keeps expanding, and attention drives connections between experiments. So the mechanism is real. Can we formalize and encode it?

The logical primitive

Two samples, one before and one after, and a diff. What flipped is figure; what held is ground. If that operation is encodable, the loop closes.

Every car on the road already runs this. OBD-II reads sensor states, diffs against expected values, and generates fault codes: the mechanic’s intuition, mechanized since 1996. But OBD-II is hardcoded: the fault tree is hand-authored, the hypotheses enumerated in advance. The primitive works; it just never left the engine bay. The same abstraction that tells a mechanic “test the alternator next” is what ML has no general interface for.

The intuition underneath is diff: snapshot before, snapshot after, XOR. The prefixes are degrees of freedom:

Unary diff (one before, one after) → one frame
Bi-abduction → infer frame autonomously from observation
Incorrectness → flip polarity (attend to failure, not success)
Tri-abduction → diff across branches (cause vs counterfactual)

Each step adds an operand. One snapshot pair → one frame. Two pairs (actual and counterfactual) → one causal edge. N pairs across N branches → a typed subgraph. The pattern stays diff; the arity grows. Everything past the minimal diff is optimization: more samples (Ernst/Daikon), autonomous frame inference (bi-abduction), flipped polarity (incorrectness), branching (tri-abduction). The primitive already works at two samples.

Bi-abduction

Tri-abduction

Three witnesses

Infer (Facebook, from Calcagno et al. 2009): bi-abduction running in production on millions of lines of code. Formal observation + formal goal → formal hypothesis. The primitive, deployed.

Voyager (Wang et al. 2023): GPT-4 observes failure, proposes a fix, self-verifies, commits to a skill library. The loop closes on its own. 180 verified skills in Minecraft. LLM as abduction engine.

IRM (Arjovsky et al. 2019): invariant feature separation across training environments. Use environment variation as the lever to force figure/ground separation. The ML field’s version, closest to scaling.

Three fields, three encodings, zero cross-citation. Peirce named the genus in 1903. Everyone else described their species.

The monad move

Instead of arguing whether SOAR’s chunking and bi-abduction are “the same operation,” encode abduction as a typed primitive and run it. Let results settle the debate. The artifact is the crosswalk.

abduct_candidates(observation, target) → list[Candidate]. The test is both the evidence and the argument.

The missing mode

The Bitter Lesson says general methods leveraging computation beat hand-engineered knowledge. The lesson is right. It’s about induction. Scale the data, scale the compute, let the model find the pattern. The entire scaling-laws era is mode two applied.

But induction doesn’t generate hypotheses. It generalizes from examples someone already chose. The question of which experiment to run next, which data to collect, which perturbation to try. That’s outside the lesson’s frame. Induction scales within a research program. Abduction generates research programs.

Every capability curve is a sigmoid. The market prices the steep part and extrapolates a line. When the curve flattens, the marginal return on scaling induction drops. The mechanism that finds the next sigmoid: the next architecture, the next question, the next perturbation that starts a new curve. That’s mode three. ML has deduction (type checkers, provers) and induction (everything since the perceptron). The third mode is the gap.

What happens when you point it at a human playthrough, or ten thousand RL rollouts?