Diff

Chapter 4 · Part II begins · Calcagno 2009, O'Hearn 2019, Arjovsky 2019, Rubin 1915

Chapter 3 showed the same abstract schema under eight names. This chapter defines the primitive: snapshot before, snapshot after, XOR. What flipped is figure; what held is ground. Diff is the contrast that abduction operates over.

Mechanic

A driver turns the key and gets nothing: no crank, no start. Yesterday the car started fine. The mechanic turns the key herself, watches the dash lights dim and the starter give a single click, and two hypotheses fire: weak battery, or a bad connection at the terminals. Nobody taught her a hypothesis-generation algorithm. The shape of the failure named the next test.

Where did the hypotheses come from? She compared two states: the car that started yesterday, the car that won't crank today. She noted what changed (dim lights, a single click, voltage sagging under load) and what stayed the same (fuel level, security light, the key turning freely). The changes pointed at the starting circuit. The hypotheses followed from the diff.

Diff is computable. The mechanic performed it in her head. OBD-II, the on-board diagnostics port in every car since 1996, performs it in silicon. Facebook Infer performs it on code. The representation changes; the operation stays the same.

Primitive

Diff is only the substrate of abduction. It produces the contrast that abduction operates over.

The mechanic already has every role the definition needs. Her expected state is the car that started yesterday: strong crank, bright lights, a starter that spins. Her observed state is the car in front of her: no crank, dim lights, a single click. Her background frame is everything she trusts to stay put while she reasons, from the fuel level to the behavior of copper wire. And her two hypotheses are small edits to that background (a battery gone weak, a terminal gone loose) that would turn today's observation into an ordinary consequence. The definition below only names these roles:

Given expected state E, observed state O, and background frame B, partition what differs from what remains invariant.

Abduction then uses that partition to generate candidate causes H: minimal revisions to B that would make O unsurprising. The partition itself is the primitive. All eight names in Chapter 3 instantiate it. The simplest encoding: take two snapshots of a system's state, one before an event and one after. The comparison partitions state into two sets:

Figure: what changed. Variables whose values differ between snapshots. In Gestalt terms, what pops out against a stable background.

Ground: what held. Variables whose values stayed the same. The context that remained invariant while the figure shifted. In separation logic (Chapter 5), this is the frame that bi-abduction infers.

Formally:

diff(state_before, state_after) → (figure, ground)

The partition is symmetric. It doesn't matter which snapshot is “first.” But use is asymmetric. The before-state is the baseline; the after-state is the perturbation. The diff tells you what the perturbation touched.

Caution: diff gives candidates, not causes. A changed variable may be cause, effect, symptom, or coincident noise. The mechanic's diff puts the chief complaint (no crank) next to the diagnostic signs (dim lights, single click, sagging voltage), all in the figure. The diff alone does not know which is upstream. Separating cause from effect requires a dependency graph or an experiment. The diff just names what changed.

XOR

The simplest instantiation: state is a set of key-value pairs. The figure is every key whose value changed. The ground is every key whose value didn't.

Python

def diff(before: dict, after: dict) -> tuple[dict, dict]:
    """Compute figure (what changed) and ground (what held).

    Returns (figure, ground).
    figure: keys present in both with different values,
            plus keys added or removed.
    ground: keys present in both with identical values.
    """
    all_keys = set(before) | set(after)
    figure = {}
    ground = {}

    for key in all_keys:
        b = before.get(key)
        a = after.get(key)
        if b == a:
            ground[key] = a
        else:
            figure[key] = (b, a)   # (was, now)

    return figure, ground

print("diff() defined")

Apply it to the no-start:

Python

def diff(before, after):
    all_keys = set(before) | set(after)
    figure, ground = {}, {}
    for key in all_keys:
        b, a = before.get(key), after.get(key)
        if b == a: ground[key] = a
        else: figure[key] = (b, a)
    return figure, ground

before = {
    "crank":         "strong",
    "engine":        "starts",
    "dash_lights":   "bright",
    "starter_sound": "spins",
    "battery_rest":  "12.6V",
    "fuel":          "half",
    "security":      "off",
}

after = {
    "crank":         "none",
    "engine":        "no-start",
    "dash_lights":   "dim",
    "starter_sound": "single click",
    "battery_rest":  "12.1V",
    "fuel":          "half",
    "security":      "off",
}

figure, ground = diff(before, after)

print("FIGURE (what changed):")
for k, (was, now) in figure.items():
    print(f"  {k}: {was} -> {now}")

print("\nGROUND (what held):")
for k, v in ground.items():
    print(f"  {k}: {v}")

Five figure entries: no crank, no start, dim lights, a single click from the starter, and a slightly lower resting voltage. Two ground entries: fuel and the security light, both unchanged.

The hypotheses follow from the figure. Dim lights, a single click, no crank. What connects them? The starting circuit: battery, wiring, the starter motor, the ignition switch. Each can starve a crank, and the battery and wiring sit upstream of the dimming lights and the sagging voltage too. The ground tells the mechanic what to deprioritize. Fuel is full and the security light is off, so neither a fuel fault nor an immobilizer lockout is implicated (though that doesn't eliminate them unless the measurement set is complete).

This is the contrast primitive. Hypothesis generation, ranking, and testing build on top of it.

Degrees of freedom

The diff above is the minimal case: one before, one after, one partition. The variants in the literature add degrees of freedom to this primitive:

Variant	Inputs	Outputs	What it adds
Unary diff	One before, one after	Figure + ground	The primitive. This chapter.
Bi-abduction	Partial before, partial after	Inferred frame + inferred anti-frame	Infers the ground autonomously. Ch 5.
Incorrectness	One before, one after	Under-approximation of bugs	Flip polarity: attend to failure, not success.
Tri-abduction	Fork: shared start, two branches	Causal edge (what the branch changed)	Diff across branches, not just time. Ch 6.

Each step adds an operand. One snapshot pair gives one frame. Two pairs (actual and counterfactual) give one causal edge. N pairs across N branches give a typed subgraph. The pattern stays diff; the arity grows.

Three witnesses

Three systems, three decades, three fields. Each encodes the diff. They are rarely presented as instances of one operation.

OBD-II (1996): hardcoded diff

OBD-II reads sensor states, diffs against expected values, and generates fault codes. (Real OBD-II adds thresholds, monitors, and enable conditions; "hardcoded diff" is a simplification of the core logic.) Vehicles have run this since 1996.

But OBD-II is hardcoded. The fault tree is hand-authored, the hypotheses enumerated in advance. An engineer decided which deviations map to which codes. If a failure mode wasn't anticipated, nothing fires. The primitive works; it just runs on a fixed table.

Python

# OBD-II style: hardcoded fault table
FAULT_TABLE = {
    "alternator_voltage": {
        "low":  ["P0562 - System Voltage Low",
                 "Check battery", "Check voltage regulator"],
        "high": ["P0563 - System Voltage High",
                 "Check voltage regulator", "Check wiring"],
    },
    "coolant_temp": {
        "high": ["P0217 - Engine Overtemp",
                 "Check thermostat", "Check coolant level"],
    },
}

def obd_diff(expected: dict, observed: dict) -> list[str]:
    """Hardcoded diff: look up deviations in the fault table."""
    codes = []
    for sensor, exp_val in expected.items():
        obs_val = observed.get(sensor)
        if obs_val != exp_val and sensor in FAULT_TABLE:
            if obs_val in FAULT_TABLE[sensor]:
                codes.extend(FAULT_TABLE[sensor][obs_val])
    return codes

expected = {"alternator_voltage": "normal", "coolant_temp": "normal"}
observed = {"alternator_voltage": "low",    "coolant_temp": "normal"}

for code in obd_diff(expected, observed):
    print(code)

# OBD-II style: hardcoded fault table
FAULT_TABLE = {
    "alternator_voltage": {
        "low":  ["P0562 - System Voltage Low",
                 "Check battery", "Check voltage regulator"],
        "high": ["P0563 - System Voltage High",
                 "Check voltage regulator", "Check wiring"],
    },
    "coolant_temp": {
        "high": ["P0217 - Engine Overtemp",
                 "Check thermostat", "Check coolant level"],
    },
}

def obd_diff(expected: dict, observed: dict) -> list[str]:
    """Hardcoded diff: look up deviations in the fault table."""
    codes = []
    for sensor, exp_val in expected.items():
        obs_val = observed.get(sensor)
        if obs_val != exp_val and sensor in FAULT_TABLE:
            if obs_val in FAULT_TABLE[sensor]:
                codes.extend(FAULT_TABLE[sensor][obs_val])
    return codes

expected = {"alternator_voltage": "normal", "coolant_temp": "normal"}
observed = {"alternator_voltage": "low",    "coolant_temp": "normal"}

for code in obd_diff(expected, observed):
    print(code)

The limitation is the table. Every hypothesis must be written down before the system ships. No table entry, no hypothesis. The diff is there; the inference is manual.

Facebook Infer (2009): automated diff

Infer (Calcagno et al. 2009) runs bi-abduction on millions of lines of production code. Given a function's precondition and postcondition (what must hold before it runs, and what holds after), it infers the frame (the memory the function didn't touch) and the anti-frame (the memory that must exist for the function to be safe).

The figure is what the function modifies; the ground is what it leaves alone. Infer doesn't require the programmer to specify the ground. That's the “bi”: abduction in both directions, computing what must hold before and what the function preserves.

Infer moved the diff from a hand-authored table (OBD-II) to an automated inference engine. Same primitive, higher automation.

Invariant Risk Minimization (2019): learned diff

Invariant Risk Minimization (IRM; Arjovsky et al. 2019) uses environment variation to force figure/ground separation. Train a model across multiple environments. Features that predict the outcome in all environments are invariant (ground). Features that predict in some but not others are spurious (figure).

IRM diffs across environments rather than across time. Instead of comparing two snapshots of one system, it compares the same learning task under different conditions. Invariant features are ground; environment-specific features are figure. (Note the inversion: in IRM, the invariant features are the causal signal you want. Calling them "ground" follows the changed/unchanged definition but reverses the ordinary sense of "figure = thing of interest." The role assignment holds; the valence flips.)

System	Year	Diff over	Figure	Ground
OBD-II	1996	Expected vs. observed sensor values	Fault codes (hand-authored)	Normal operating range (hand-authored)
Infer	2009	Precondition vs. postcondition	Modified heap (automated)	Frame: untouched heap (inferred)
IRM	2019	Environment A vs. environment B	Spurious features (learned)	Invariant features (learned)

Three encodings. Handcoded table (OBD-II), automated inference (Infer), learned separation (IRM). IRM is the loosest analogue; it seeks invariant representations rather than literally diffing states. But the structural role holds: partition observations into signal and noise.

Code: full loop

Combine the primitive with hypothesis generation. Given a diff, produce candidate explanations by examining what the figure touches in a dependency graph.

Python

from dataclasses import dataclass

@dataclass
class Hypothesis:
    component: str
    reason: str
    testable: bool = True

def diff(before: dict, after: dict) -> tuple[dict, dict]:
    """Partition state into figure (changed) and ground (held)."""
    all_keys = set(before) | set(after)
    figure, ground = {}, {}
    for key in all_keys:
        b, a = before.get(key), after.get(key)
        if b == a:
            ground[key] = a
        else:
            figure[key] = (b, a)
    return figure, ground

def abduct(figure: dict, dependencies: dict) -> list[Hypothesis]:
    """Generate hypotheses from the diff.

    dependencies: maps each state variable to components
    that could cause it to change.
    """
    candidates = []
    seen = set()
    for key in figure:
        for component in dependencies.get(key, []):
            if component not in seen:
                seen.add(component)
                was, now = figure[key]
                candidates.append(Hypothesis(
                    component=component,
                    reason=f"{key} changed ({was} -> {now})",
                ))
    return candidates


# --- The mechanic scenario ---

before = {
    "crank": "strong", "engine": "starts",
    "dash_lights": "bright", "starter_sound": "spins",
    "battery_rest": "12.6V", "fuel": "half", "security": "off",
}

after = {
    "crank": "none", "engine": "no-start",
    "dash_lights": "dim", "starter_sound": "single click",
    "battery_rest": "12.1V", "fuel": "half", "security": "off",
}

# Which components can cause each state variable to change?
dependencies = {
    "crank":         ["battery", "wiring", "starter_motor", "ignition_switch"],
    "dash_lights":   ["battery", "wiring"],
    "starter_sound": ["starter_motor", "battery", "wiring"],
    "battery_rest":  ["battery"],
    # engine "no-start" is downstream of "no crank" — no independent candidates
}

figure, ground = diff(before, after)
hypotheses = abduct(figure, dependencies)

print("Diff result:")
print(f"  Figure: {list(figure.keys())}")
print(f"  Ground: {list(ground.keys())}")
print(f"\n{len(hypotheses)} hypotheses generated:")
for h in hypotheses:
    print(f"  [{h.component}] {h.reason}")

from dataclasses import dataclass

@dataclass
class Hypothesis:
    component: str
    reason: str
    testable: bool = True

def diff(before: dict, after: dict) -> tuple[dict, dict]:
    """Partition state into figure (changed) and ground (held)."""
    all_keys = set(before) | set(after)
    figure, ground = {}, {}
    for key in all_keys:
        b, a = before.get(key), after.get(key)
        if b == a:
            ground[key] = a
        else:
            figure[key] = (b, a)
    return figure, ground

def abduct(figure: dict, dependencies: dict) -> list[Hypothesis]:
    """Generate hypotheses from the diff.

dependencies: maps each state variable to components
    that could cause it to change.
    """
    candidates = []
    seen = set()
    for key in figure:
        for component in dependencies.get(key, []):
            if component not in seen:
                seen.add(component)
                was, now = figure[key]
                candidates.append(Hypothesis(
                    component=component,
                    reason=f"{key} changed ({was} -> {now})",
                ))
    return candidates

# --- The mechanic scenario ---

before = {
    "crank": "strong", "engine": "starts",
    "dash_lights": "bright", "starter_sound": "spins",
    "battery_rest": "12.6V", "fuel": "half", "security": "off",
}

after = {
    "crank": "none", "engine": "no-start",
    "dash_lights": "dim", "starter_sound": "single click",
    "battery_rest": "12.1V", "fuel": "half", "security": "off",
}

# Which components can cause each state variable to change?
dependencies = {
    "crank":         ["battery", "wiring", "starter_motor", "ignition_switch"],
    "dash_lights":   ["battery", "wiring"],
    "starter_sound": ["starter_motor", "battery", "wiring"],
    "battery_rest":  ["battery"],
    # engine "no-start" is downstream of "no crank" — no independent candidates
}

figure, ground = diff(before, after)
hypotheses = abduct(figure, dependencies)

print("Diff result:")
print(f"  Figure: {list(figure.keys())}")
print(f"  Ground: {list(ground.keys())}")
print(f"\n{len(hypotheses)} hypotheses generated:")
for h in hypotheses:
    print(f"  [{h.component}] {h.reason}")

Four candidates from the figure: battery, wiring, starter motor, ignition switch. They don't weigh equally. The battery is upstream of every changed variable (the crank, the lights, the click, the resting voltage), and wiring of most; the starter motor explains the crank and the click, the ignition switch only the crank. The ground (full fuel, security off) keeps the fuel system and the immobilizer off the list. Check the battery and its connections first.

Notice what the code does not do. It does not test the hypotheses, rank them, or estimate their probability. It generates them. The diff is a hypothesis-generation primitive. Testing is induction. Ranking is economy of research (ch 8). The diff names the candidates.

What breaks

The diff requires you to know what to observe. Every variable was chosen by someone: the mechanic who checked six gauges, the engineer who wired six sensors, the programmer who logged six fields. If the relevant state lives in a variable nobody snapshotted, the diff misses it.

Return to the mechanic. She snapshotted resting voltage but never voltage under load. If the real cause is a battery that holds 12.1V at rest yet collapses to 8V the instant the starter pulls current, the diff cannot find it. Load voltage isn't in either snapshot. Not in the figure, not in the ground. Absent.

Python

def diff(before, after):
    all_keys = set(before) | set(after)
    figure, ground = {}, {}
    for key in all_keys:
        b, a = before.get(key), after.get(key)
        if b == a: ground[key] = a
        else: figure[key] = (b, a)
    return figure, ground

# The variable that matters isn't in the snapshot.
before = {
    "crank": "strong", "engine": "starts",
    "dash_lights": "bright", "starter_sound": "spins",
    "battery_rest": "12.6V", "fuel": "half",
    # "battery_load": "10.8V" — not measured
}

after = {
    "crank": "none", "engine": "no-start",
    "dash_lights": "dim", "starter_sound": "single click",
    "battery_rest": "12.1V", "fuel": "half",
    # "battery_load": "8.1V" — would have been the smoking gun, but we missed it
}

figure, ground = diff(before, after)

# The diff correctly reports what changed among observed variables.
# But the real cause (battery_load: 10.8V -> 8.1V) is invisible.
# Resting voltage barely moved; the collapse under load was never measured.
print("Figure:", list(figure.keys()))
print("Ground:", list(ground.keys()))
print("Battery load voltage: not in snapshot. Hypothesis space is incomplete.")

def diff(before, after):
    all_keys = set(before) | set(after)
    figure, ground = {}, {}
    for key in all_keys:
        b, a = before.get(key), after.get(key)
        if b == a: ground[key] = a
        else: figure[key] = (b, a)
    return figure, ground

# The variable that matters isn't in the snapshot.
before = {
    "crank": "strong", "engine": "starts",
    "dash_lights": "bright", "starter_sound": "spins",
    "battery_rest": "12.6V", "fuel": "half",
    # "battery_load": "10.8V" — not measured
}

after = {
    "crank": "none", "engine": "no-start",
    "dash_lights": "dim", "starter_sound": "single click",
    "battery_rest": "12.1V", "fuel": "half",
    # "battery_load": "8.1V" — would have been the smoking gun, but we missed it
}

figure, ground = diff(before, after)

# The diff correctly reports what changed among observed variables.
# But the real cause (battery_load: 10.8V -> 8.1V) is invisible.
# Resting voltage barely moved; the collapse under load was never measured.
print("Figure:", list(figure.keys()))
print("Ground:", list(ground.keys()))
print("Battery load voltage: not in snapshot. Hypothesis space is incomplete.")

This is the structural limitation. The unary diff partitions observed state into figure and ground. It cannot reason about unobserved state. The hypothesis space is bounded by what you chose to measure.

OBD-II shows this concretely. If a failure involves a sensor the ECU doesn't monitor, no code fires. Engineers add more sensors, but you can never instrument everything. Some state will always be unmeasured.

Bi-abduction addresses this by inferring the frame: the state the operation must not have touched for the result to be valid. It reasons backward from the postcondition, including state never explicitly observed. The diff goes from "compare what you measured" to "infer what you must have missed."

Exercises

💻 marks exercises meant for a keyboard. ★ marks open-ended problems with no single right answer.

4.1 A variable lands in the figure. The chapter warns it may be a cause, an effect, a symptom, or coincident noise. Using the no-crank scenario's five figure entries, give a one-line example of each of the four.

4.2 The Case of the Evening Wi-Fi. Before: the laptop holds its connection all day, the microwave is idle, router uptime reads three days, the neighbors' networks are visible. After, every evening around six: the connection drops, the microwave is running, router uptime is unchanged, the neighbors' networks are visible. Compute figure and ground by hand. Build a small dependency map and generate the candidate hypotheses. Then name one variable missing from both snapshots that could carry the real cause, and say how you would measure it.

4.3 Pick a machine you use daily (espresso machine, bicycle, printer, a login flow) and write before and after snapshots of six variables for a failure you have actually seen. Build the dependency map, run the abduction by hand, and check that the true cause appears among the candidates. Then delete one measurement from the snapshots so that the true cause becomes invisible, and name the gauge you just removed.

4.4 💻 The chapter's abduct() treats all candidates equally, while the prose ranks the battery first because it sits upstream of every changed variable. Extend abduct() to score each component by how many figure entries it can explain, sort by score, and print the ranking. Confirm that the battery wins and the ignition switch comes last.

4.5 ★ For the last thing in your life that broke, write down the two snapshots you implicitly compared, the figure, the ground, and the variable you never measured. Keep the list; Chapter 7 asks where your gauges came from.

Sources

Rubin 1915	Synsoplevede Figurer. Figure-ground segregation in visual perception. The perceptual ancestor of diff.
SAE J1962 (1996)	OBD-II standard. Diagnostic connector, protocol, and trouble code conventions. The diff, hardcoded since 1996.
Calcagno et al. 2009	"Compositional Shape Analysis by Means of Bi-Abduction." POPL. The frame inference engine behind Facebook Infer. Automated figure/ground from separation logic.
Arjovsky et al. 2019	"Invariant Risk Minimization." Environment variation as the lever for figure/ground separation. Learned diff across training conditions.
O'Hearn 2019	"Incorrectness Logic." POPL. Flip the polarity of the diff: under-approximate bugs instead of over-approximating correctness.
Ernst et al. 2001	"Dynamically Discovering Likely Program Invariants to Support Program Evolution." Daikon: infer invariants (ground) from observed execution traces. More samples, sharper ground.

Neighbors

Methodeutics
Ch 2: Security and Uberty — the tradeoff that makes abduction fertile
Ch 8: Economy of Research — selecting among the hypotheses this chapter generates
Abduction — the blog post this chapter formalizes

External

Facebook Infer — bi-abduction in production
OBD-II (Wikipedia)
Arjovsky et al. 2019 — Invariant Risk Minimization (arXiv)

← Eight names by june.kim Bi-abduction →