Causality

Judea Pearl · 2000 · Models, Reasoning, and Inference · Cambridge University Press

Correlation is what you see. Causation is what would happen if you reached in and changed something. Pearl wrote down the difference: P(Y | X) versus P(Y | do(X)). One symbol, and a confusion that had haunted statistics for a century became visible.

The argument

One way to read the lineage (my framing, not necessarily Fisher's or Pearl's self-understanding): Mill gave us the methods of difference and agreement, telling us to vary one thing and hold the rest constant. Fisher gave us randomization, which can be read as Mill's method of difference implemented stochastically. Path analysis, structural equation modeling, and twentieth-century epidemiology built more on top, each in its own dialect. What was missing was a shared formal language for asking causal questions about systems you couldn't randomize: economies, climates, populations, history. The canonical default answer was "you can't, without an experiment," and the workarounds were specific to each field. Pearl supplies a unified notation and a sharper answer: sometimes yes, sometimes no, and given an assumed graph, here is the procedure for telling which case you're in.

The central object is the do-operator. P(Y | X = x) is what you observe when X happens to take the value x: an association. P(Y | do(X = x)) is what you would observe if you reached in and forced X to be x, severing every other influence on X. The two are different distributions, and one of the central facts of statistics is that they often disagree. Smoking is correlated with lung cancer. Forcing someone to smoke would also cause lung cancer. The two probabilities point the same direction here, but that agreement is an empirical fact about smoking, not a definitional one about probability. Whenever they diverge (confounders, selection effects, reverse causation, colliders) and we read the seeing-probability as if it were the doing-probability, we make exactly the kind of mistake that builds a literature of false findings.

The deeper move is the do-calculus: three rules, written in the language of directed acyclic graphs, that tell you whether a particular interventional probability can be computed from observational data, given an assumed causal graph. The graph is not delivered by the calculus; you supply it from substantive knowledge of the system. What the calculus delivers, conditional on the graph, is a verdict: here is a closed-form expression that identifies the causal effect, or here is a proof that no such expression exists from the variables you measured. Sometimes the verdict is "you cannot answer this question from your data, full stop." No extra covariate would help; the only path forward is a different study design. Turning that judgment call into a decision procedure is the load-bearing innovation. Pen and paper, in the strict sense.

Key concepts

Concept	What it means
Do-operator: do(X = x)	A formal symbol for intervention. P(Y \| do(X = x)) is the distribution of Y when X is forced, not merely observed, to take the value x.
Causal hierarchy	Three rungs: seeing (association), doing (intervention), imagining (counterfactual). Each rung answers questions the rung below cannot, no matter how much data you collect.
DAG (directed acyclic graph)	A diagram of which variables can causally influence which others. The graph is the assumption; the calculus is what you do with it.
Backdoor criterion	A graphical test for when conditioning on a set of observed variables is enough to identify a causal effect, given an assumed causal graph. If your variables pass it, you can estimate the effect; if they don't, sometimes another variable in the graph closes the gap and sometimes no observational adjustment can, and the calculus tells you which case you're in.
Structural causal model	A system of equations plus a DAG that fully specifies the data-generating process. Counterfactual queries are answered by surgically modifying the model.

Connections

Mill stated the methods of causal inference informally; Pearl gives them a formal calculus. Where Mill said "vary one thing, hold the rest constant," Pearl says "given the causal diagram you're assuming, here is whether a sufficient adjustment set exists, what it is when it does, and a proof that no such set exists when it doesn't." Mill's intuitions become theorems, and the cases where Mill's program is unfollowable become theorems too.

Fisher's randomization is the gold standard for the same reason: when you randomize the assignment of X, you mechanically sever the arrows from any confounders to X, which means P(Y | X) and P(Y | do(X)) become equal by construction. Pearl's contribution is showing that randomization is not the only way to achieve that equality. Sometimes you can achieve it with the right conditioning set, and the do-calculus tells you when. Randomization is one solution to the identification problem. The do-calculus is the general theory of the problem.

The other half of modern causal inference is the potential outcomes framework, originally written down by Jerzy Neyman in 1923 (in Polish, mostly forgotten outside Poland for fifty years) and rediscovered and formalized by Donald Rubin in 1974. Rubin's notation and Pearl's graphs are largely equivalent for the questions both can express, but the working idioms differ. Pearl's diagrams are easier to draw on the back of an envelope; Rubin's potential outcomes are closer to standard statistical estimators. The credibility revolution in econometrics (Angrist, Imbens, Card, 2021 Nobel) ran principally on the potential-outcomes side: natural experiments, instrumental variables, regression discontinuity, difference-in-differences. Many economists were explicitly cool on DAGs. Pearl's contribution to that revolution was indirect: a parallel formalization of the same underlying logic, more visible in epidemiology, computer science, and recently in causal machine learning than in mainstream econometrics.

Ioannidis's diagnosis is broader than causal confusion: it covers low priors, low power, multiplicity, analytic flexibility, and publication bias. But causal confusion is one strand inside it, and a particularly damaging one, because the resulting findings often do replicate. They're causally wrong, not statistically noisy. Hormone replacement therapy and heart disease was the textbook case: well-conducted observational studies, internally consistent, externally replicated, and contradicted by the WHI randomized trial when the question finally got asked the right way. Pearl's framework does not fix the social structure that produces such literatures, but it gives you the language to write down when an observational design is and isn't licensed to make a causal claim.

Mayo's severe testing asks whether a test could have caught the hypothesis being wrong. For a causal hypothesis, that question only has a meaningful answer if you can write the hypothesis down precisely enough to derive what "different" data would look like. The do-calculus is the apparatus for writing causal hypotheses down. Severity gives you the philosophical principle; Pearl gives you the syntax that makes the principle operational.

Why this is the most pen-and-paper thing in modern statistics

Causal inference is the area of statistics where the objects are explicitly notational. The do-operator is a symbol. The DAG is a diagram you can draw on a napkin. The backdoor criterion is a graph-theoretic procedure you can run by hand on small examples. The front-door adjustment is a closed-form expression. Pearl's claim is not that he has discovered new facts about the world; it is that he has invented a language in which previously unwriteable questions can be written, and a calculus for resolving them. The field's progress since 2000 is largely the progress of working out what that language can express.

The transmissibility is the point. If you can write a causal question in Pearl's notation, and supply the causal graph you're assuming, then another mind can pick up the page and either solve the question, prove it unsolvable from your data, or pinpoint which additional measurement would help. The objectivity here is conditional, not absolute: given the same graph and the same data, different readers reach the same verdict. The graph itself remains a substantive assumption you have to defend. That is the methodologically objective sense of "foundation," verifiable between minds once the assumptions are written down, and it is the kind of foundation you can build on.

The technical reference is Causality (Cambridge, 2nd ed. 2009). The popular treatment is The Book of Why (Pearl & Mackenzie, 2018). The cleanest introduction for working analysts is Causal Inference: The Mixtape (Cunningham, free online). For the econometrics-flavored counterpart, see Mostly Harmless Econometrics (Angrist & Pischke, 2009).

Neighbors

🔬 Mill 1843 — the methods of difference and agreement Pearl formalizes
🔬 Fisher 1935 — randomization as one (not the only) path to identification
🔬 Ioannidis 2005 — the false-findings diagnosis; causal confusion is one strand of it, the one Pearl's notation makes visible
🔬 Mayo 2018 — severity needs causal hypotheses written down precisely; the do-calculus is how you write them

External

← Feynman 1974 · 12 of 20 by june.kim Ioannidis 2005 · 14 of 20 →