It would be wrong to say that Judea Pearl knows more about causal inference than anyone else — I can think of some rivals very close to where I'm writing this — but he certainly knows a lot, and has worked tirelessly to formulate and spread the modern way of thinking about the subject, centered around graphical models and their associated structural equations. I remember spending many happy hours with his book Causality when it came out in 2000, and look forward to spending more with the new edition, which is making its way to me through the mail now. In the meanwhile, however, there is what he describes as "A new survey paper, gently summarizing everything I know about causation (in only 43 pages)":
The paper assumes a reader who's reasonably well-grounded in statistics, though not necessarily in the causal-inference literature. (Of such readers, I imagine applied economists might have more unlearning to do than most, because they will keep asking "but when do I start estimating beta?") It's not ideally calibrated for an reader coming from, say, machine learning.
One theme running through the paper is the futility of trying to define causality in purely probabilistic terms, and the fact that cases where it looks like one can do so are really cases where causal assumptions have been smuggled in. Another is that once you realize counterfactual or mechanistic assumptions are needed, the graphical-models/structural equation framework makes it immensely easier to reason about them than does the rival "potential outcomes" framework. In fact, the objects which the potential outcomes framework takes as its primitives can be constructed within the structural framework, so the correct part of the former is a subset of the latter. And by reasoning on graphical models it is easy to see that confounding can be introducing by "controlling for" the wrong variables, something explicitly denied by leading members of the potential-outcomes school. (Pearl quotes them making this mistake, and manages to pull off a more-in-sorrow-than-in-glee tone while doing so.) Mostly, however, the paper is about showing off what can be done within the new framework, which is really pretty impressive, and ought to be part of the standard tool-kit of data analysis. If you are not already familiar with it, this is an excellent place to begin, and if you are you will enjoy the elegant and comprehensive presentation.
Looking back over what I write in this blog, I feel like, on the one hand, there's too little of it lately, and on the other hand, it's too tilted towards negative, critical stuff. While not regretting at all being negative and critical about stupid ideas that need to be criticized (or, really, pulverized), I will try to expand and balance my output by posting at least once a week on some good science. We'll see how this goes.
Posted at September 25, 2009 10:12 | permanent link