Graphical Causal Models09 Jul 2021 11:50
Everyone who takes basic statistics has it drilled into them that "correlation is not causation." (When I took psych. 1, the professor said he hoped that, if he were to come to us on our death-beds and prompt us with "Correlation is," we would all respond "not causation.") This is a problem, because one can infer correlation from data, and would like to be able to make inferences about causation. There are typically two ways out of this. One is to perform an experiment, preferably a randomized double-blind experiment, to eliminate accidental sources of correlation, common causes, etc. That's nice when you can do it, but impossible with supernovae, and not even easy with people. The other out is to look for correlations, say that of course they don't equal causations, and then act as if they did anyway. The technical names for this latter course of action are "linear regression" and "analysis of variance," and they form the core of applied quantitative social science, e.g., The Bell Curve.
Graphical models are, in part, a way of escaping from this impasse.
The basic idea is as follows. You have a bunch of variables, and you want to represent the causal relationships, or at least the probabilistic dependencies, between them. You do so by means of a graph. Each node in the graph stands for a variable. If variable A is a cause of B, then an arrow runs from A to B. If A is a cause of B, we also say that A is one of B's parents, and B one of A's children. If there is a causal path from A to B, then A is an ancestor of B, and B is a descendant of A. If a variable has no parents in the graph, it is exogenous, otherwise it is endogenous.
Part of what we mean by "cause" is that, when we know the immediate causes, the remoter causes are irrelevant --- given the parents, remoter ancestors don't matter. The standard example is that applying a flame to a piece of cotton will cause it to burn, whether the flame came from a match, spark, lighter or what-not. Probabilistically, this is a conditional indepedence property, or a Markov property: a variable is independent of its ancestors conditional on its parents. In fact, given its parents, its children, and its childrens' other parents, a variable is conditionally independent of all other variables. This is called the graphical or causal Markov property. When this holds, we can factor the joint probability distribution for all the variables into the product of the distribution of the exogenous variables, and the conditional distribution for each endogenous variable given its parents.
(You may be wondering what happens if A is a parent of B and B is a parent of A, as can happen when there is feedback between the variables. This leads to difficulties, traditionally dealt with by explicitly limiting the discussion to acyclic graphs. I shall follow this wise precedent here.)
Now, there are certain rules which let us infer conditional independence relations from each other. For instance, if X is independent of the combination of Y and W, given Z, then X is indepdent of Y alone given Z. So, if we have a graph which obeys the causal Markov condition, there are generally other conditional independence relations which follow from the basic ones. If these are the only conditional indepences which hold in the distribution, it is said to be faithful to the graph (or vice versa); otherwise it is unfaithful. For a graph to be Markov and unfaithful, there must (as it were) be an elaborate conspiracy among the conditional distributions, so elaborate that it will generally be destroyed by any change in any of those distributions. So faithfulness is a robust property.
This may sound pretty arcane, but that's just because it is arcane. The point, however, is that if you can make the three assumptions above (no causal cycles, Markov property, faithfulness), you're in business in a really remarkable way. There are very powerful statistical techniques that will let you infer the causal structure connecting your variables. This comes in two flavors. One is the Bayesian way: cook up a prior distribution over all possible causal graphs; compute the likelihood of the data under each graph; update your distribution over graphs; iterate. This is generally computationally intractable, assuming you can come up with a meaningful prior in the first place. The other approach is to use tests for conditional independence to eliminate possible connections between variables, and so to narrow down the range of candidate structures; it is basically frequentist, and can be shown, under a broad range of circumstances, to be asymptotically reliable.
Once you have your causal graph --- whether through estimation or through simply being handed one --- you can do lots of great things with it, like predict the effects of manipulating some of the variables, or make backward inferences from effects to causes. Of course, if the graph is big, doing the necessary calculations can be very troublesome in itself, and so people work on approximation methods and even ways of doing statistical inference on models of statistical distributions...
It's probably obvious I think this is incredibly neat, and even one of the most important ideas to come out of machine learning. Of course it doesn't really solve the problem of establishing causal relations, in the way Hume objected to; it says, assuming there are causal relations, of a certain stochastic form, and that these are stable, then they can be learned. But that, and the more general questions of what we ought to mean by "cause", deserve a notebook of their own.
Things I want to understand better: Structure discovery. (Once you know the graph, parameter estimation, and even nonparametric estimation, is, by construction, straightforward.) Statistical learning theory for graphical models. (The paper by Janzing and Herrmann is a good start). How to treat systems with feedback? How to treat dynamical systems and time series? How does all of this fit together with computational mechanics?
--- Causal discovery algorithms are of enough interest to me that they deserve their own notebook, which should take a lot of material from this one, but for now that's mostly just a reference list.
- Recommended, more general:
- Clark Glymour, The Mind's Arrows: Bayes Nets and Graphical Causal Models in Psychology [Mini-review]
- Michael Irwin Jordan (ed.), Learning in Graphical Models
- Jordan and Sejnowski (eds.), Graphical Models [Nice collection of papers from Neural Computation]
- Judea Pearl
- "Causal Inference in Statistics: An Overview", Statistics Surveys 3 (2009): 96--146 [PDF]
- Causality: Models, Reasoning and Inference
- Peter Spirtes, Clark Glymour and Richard Scheines, Causation, Prediction, and Search [Comments]
- Recommended, more specialized:
- Vanessa Didelez, "Causal Reasoning for Events in Continuous Time: A Decision–Theoretic Approach", UAI 2015
- Michael Eichler and Vanessa Didelez, "Causal Reasoning in Graphical Time Series Models", UAI 2007, arxiv:1206.5246
- Antti Hyttinen, Frederick Eberhardt and Matti Järvisalo, "Do-calculus when the True Graph Is Unknown", UAI 2015
- Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, Patrik O. Hoyer, "Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity", Journal of Machine Learning Research 11 (2010): 1709--1731
- John C. Loehlin, Latent Variable Models: An Introduction to Factor, Path, and Structural Analysis [An intro. to old-school linear latent-variable models, especially of the sort used by psychologists. Good in its own domain, but does not make enough contact with modern graphical models.]
- Marloes H. Maathuis, Diego Colombo, Markus Kalisch and Peter Bühlmann, "Predicting causal effects in large-scale systems from observational data", Nature Methods 7 (2010): 247--248 [PDF reprints of paper and supplementary information]
- Marloes H. Maathuis, Markus Kalisch, Peter Bühlmann, "Estimating high-dimensional intervention effects from observational data", Annals of Statistics 37 (2009): 3133--31654, arxiv:0810.4214
- Emilija Perković, Johannes Textor, Markus Kalisch and Marloes H. Matthuis, "A Complete Generalized Adjustment Criterion", UAI 2015, arxiv:1507.01524
- Jonas Peters, Peter Bühlmann, "Structural Intervention Distance (SID) for Evaluating Causal Graphs", arxiv:1306.1043
- Christopher J. Quinn, Todd P. Coleman, Negar Kiyavash, "Causal Dependence Tree Approximations of Joint Distributions for Multiple Random Processes", arxiv:1101.5108
- Christopher J. Quinn, Negar Kiyavash, Todd P. Coleman, "Directed Information Graphs", arxiv:1204.2003
- Thomas S. Richardson and James M. Robins, "Single World Intervention Graphs (SWIGs): A Unification of the Counterfactual and Graphical Approaches to Causality" [PDF preprint, 140+ pp.; thanks to Betsy Ogburn for telling me about this]
- James Robins and Thomas Richardson, "Alternative Graphical Causal Models and the Identification of Direct Effects" [PDF preprint]
- Modesty forbids me to recommend:
- CRS, Advanced Data Analysis from an Elementary Point of View, Part III (chapters on causal inference for statistics students)
- CRS and Andrew C. Thomas, "Homophily and Contagion Are Generically Confounded in Observational Social Network Studies", arxiv:1004.4704 [Less-technical weblog version]
- Octavio Mesner, Alex Davis, Elizabeth Casman, Hyagriv Simhan, CRS, Lauren Keenan-Devlin, Ann Borders and Tamar Krishnamurt, "Using graph learning to understand adverse pregnancy outcomes and stress pathways", PLoS One 14 (2019): e0223319
- To read:
- Animashree Anandkumar, Kamalika Chaudhuri, Daniel Hsu, Sham M. Kakade, Le Song, Tong Zhang, "Spectral Methods for Learning Multivariate Latent Tree Structure", arxiv:1107.1283 [This sounds very much like Spearman's "tetrad equations" from 100 years ago!]
- Holly Andersen, "When to expect violations of causal faithfulness and why it matters", phil-sci/9204
- Clive G. Bowsher, "Stochastic kinetic models: Dynamic independence, modularity and graphs", Annals of Statistics 38 (2010): 2242--2281
- Elias Chaibub Neto, Mark P. Keller, Alan D. Attie, and Brian S. Yandell, "Causal graphical models in systems genetics: A unified framework for joint inference of causal network and genetic architecture for correlated phenotypes", Annals of Applied Statistics 4 (2010): 320--339
- David Cox and Nanny Warmuth, Multivariate Dependcencies: Models, Analysis, and Interpretation
- Vanessa Didelez, "Graphical models for marked point processes based on local independence", arxiv:0710.5874
- Vanessa Didelez, Svend Kreiner and Niels Keiding, "Graphical Models for Inference Under Outcome-Dependent Sampling", Statistical Science 25 (2010): 368--387, arxiv:1101.0901
- Mathias Drton, Rina Foygel, and Seth Sullivant, "Global identifiability of linear structural equation models", Annals of Statistics 39 (2011): 865--886
- Seif Eldawlatly, Yang Zhou, Rong Jin and Karim G. Oweiss, "On the Use of Dynamic Bayesian Networks in Reconstructing Functional Neuronal Networks from Spike Train Ensembles", Neural Computation 22 (2010): 158--189
- Sergi Elizalde and Kevin Woods, "Bounds on the number of inference functions of a graphical model", math.CO/0610233
- Freedman, "On Specifying Graphical Models for Causation," UCB Stat. Tech. Rep. 601 [abstract, pdf]
- Green, Hjort and Richardson (eds.), Highly Structured Stochastic Systems
- Sanjiang Li, "Causal models have no complete axiomatic characterization", arxiv:0804.2401
- Philipp Rütimann and Peter Bühlmann, "High dimensional sparse covariance estimation via directed acyclic graphs", arxiv:0911.2375 = Electronic Journal of Statistics 3 (2009): 1133--1160
- Dino Sejdinovic, Arthur Gretton, Wicher Bergsma, "A Kernel Test for Three-Variable Interactions", NIPS 2013
- Bill Shipley, Cause and Correlation in Biology: A User's Guide to Path Analysis, Structural Equations and Causal Inference
- Ilya Shpitser, Judea Pearl, "Complete Identification Methods for the Causal Hierarchy", Journal of Machine Learning Research 9 (2008): 1941--1979 ["We consider a hierarchy of queries about causal relationships in graphical models, where each level in the hierarchy requires more detailed information than the one below. The hierarchy consists of three levels: associative relationships, derived from a joint distribution over the observable variables; cause-effect relationships, derived from distributions resulting from external interventions; and counterfactuals, derived from distributions that span multiple "parallel worlds" and resulting from simultaneous, possibly conflicting observations and interventions. We completely characterize cases where a given causal query can be computed from information lower in the hierarchy"]
- Ricardo Silva, Richard Scheines, Clark Glymour, Peter L. Spirtes, "Learning Measurement Models for Unobserved Variables", UAI 2003, arxiv:1212.2516
- Peter Spirtes
- "Graphical models, causal inference, and econometric models", Journal of Economic Methodology 12 (2005): 1--33 [PDF]
- "Introduction to Causal Inference", Journal of Machine Learning Research 11 (2010): 1643--1662
- "Variable Definition and Causal Inference", Proceedings of the 13th International Congress of Logic Methodology and Philosophy of Science, pp. 514--53 PDF reprint via Prof. Spirtes]
- Achim Tresch, Florian Markowetz, "Structure Learning in Nested Effects Models", 0710.4481
- Sara van de Geer, "On the uniform convergence of empirical norms and inner products, with application to causal inference", arxiv:1310.5523
- Tyler J. VanderWeele and James M. Robins
- "Minimal sufficient causation and directed acyclic graphs", Annals of Statistics 37 (2009): 1437--1465
- "Properties of Monotonic Effects on Directed Acyclic Graphs", Journal of Machine Learning Research 10 (2009): 699--718
- "Signed directed acyclic graphs for causal inference", Journal of the Royal Statistical Society B 72 (2010): 111--127
- Vivian Viallon, Onureena Banerjee, Gregoire Rey, Eric Jougla, Joel Coste, "An empirical comparative study of approximate methods for binary graphical models; application to the search of associations among causes of death in French death certificates", arxiv:1004.2287
- Jiji Zhang, "Causal Reasoning with Ancestral Graphs", Journal of Machine Learning Research 9 (2008): 1437--1474
- Zhang Jiji and Peter Spirtes
(Thanks to Gustavo Lacerda for pointing out a goof.)