Notebooks

## Graphical Causal Models

17 Oct 2019 12:32

A species of the broader genus of graphical models, especially intended to help with problems of causal inference.

Everyone who takes basic statistics has it drilled into them that "correlation is not causation." (When I took psych. 1, the professor said he hoped that, if he were to come to us on our death-beds and prompt us with "Correlation is," we would all respond "not causation.") This is a problem, because one can infer correlation from data, and would like to be able to make inferences about causation. There are typically two ways out of this. One is to perform an experiment, preferably a randomized double-blind experiment, to eliminate accidental sources of correlation, common causes, etc. That's nice when you can do it, but impossible with supernovae, and not even easy with people. The other out is to look for correlations, say that of course they don't equal causations, and then act as if they did anyway. The technical names for this latter course of action are "linear regression" and "analysis of variance," and they form the core of applied quantitative social science, e.g., The Bell Curve.

Graphical models are, in part, a way of escaping from this impasse.

The basic idea is as follows. You have a bunch of variables, and you want to represent the causal relationships, or at least the probabilistic dependencies, between them. You do so by means of a graph. Each node in the graph stands for a variable. If variable A is a cause of B, then an arrow runs from A to B. If A is a cause of B, we also say that A is one of B's parents, and B one of A's children. If there is a causal path from A to B, then A is an ancestor of B, and B is a descendant of A. If a variable has no parents in the graph, it is exogenous, otherwise it is endogenous.

Part of what we mean by "cause" is that, when we know the immediate causes, the remoter causes are irrelevant --- given the parents, remoter ancestors don't matter. The standard example is that applying a flame to a piece of cotton will cause it to burn, whether the flame came from a match, spark, lighter or what-not. Probabilistically, this is a conditional indepedence property, or a Markov property: a variable is independent of its ancestors conditional on its parents. In fact, given its parents, its children, and its childrens' other parents, a variable is conditionally independent of all other variables. This is called the graphical or causal Markov property. When this holds, we can factor the joint probability distribution for all the variables into the product of the distribution of the exogenous variables, and the conditional distribution for each endogenous variable given its parents.

(You may be wondering what happens if A is a parent of B and B is a parent of A, as can happen when there is feedback between the variables. This leads to difficulties, traditionally dealt with by explicitly limiting the discussion to acyclic graphs. I shall follow this wise precedent here.)

Now, there are certain rules which let us infer conditional independence relations from each other. For instance, if X is independent of the combination of Y and W, given Z, then X is indepdent of Y alone given Z. So, if we have a graph which obeys the causal Markov condition, there are generally other conditional independence relations which follow from the basic ones. If these are the only conditional indepences which hold in the distribution, it is said to be faithful to the graph (or vice versa); otherwise it is unfaithful. For a graph to be Markov and unfaithful, there must (as it were) be an elaborate conspiracy among the conditional distributions, so elaborate that it will generally be destroyed by any change in any of those distributions. So faithfulness is a robust property.

This may sound pretty arcane, but that's just because it is arcane. The point, however, is that if you can make the three assumptions above (no causal cycles, Markov property, faithfulness), you're in business in a really remarkable way. There are very powerful statistical techniques that will let you infer the causal structure connecting your variables. This comes in two flavors. One is the Bayesian way: cook up a prior distribution over all possible causal graphs; compute the likelihood of the data under each graph; update your distribution over graphs; iterate. This is generally computationally intractable, assuming you can come up with a meaningful prior in the first place. The other approach is to use tests for conditional independence to eliminate possible connections between variables, and so to narrow down the range of candidate structures; it is basically frequentist, and can be shown, under a broad range of circumstances, to be asymptotically reliable.

Once you have your causal graph --- whether through estimation or through simply being handed one --- you can do lots of great things with it, like predict the effects of manipulating some of the variables, or make backward inferences from effects to causes. Of course, if the graph is big, doing the necessary calculations can be very troublesome in itself, and so people work on approximation methods and even ways of doing statistical inference on models of statistical distributions...

It's probably obvious I think this is incredibly neat, and even one of the most important ideas to come out of machine learning. Of course it doesn't really solve the problem of establishing causal relations, in the way Hume objected to; it says, assuming there are causal relations, of a certain stochastic form, and that these are stable, then they can be learned. But that, and the more general questions of what we ought to mean by "cause", deserve a notebook of their own.

Things I want to understand better: Structure discovery. (Once you know the graph, parameter estimation, and even nonparametric estimation, is, by construction, straightforward.) Statistical learning theory for graphical models. (The paper by Janzing and Herrmann is a good start). How to treat systems with feedback? How to treat dynamical systems and time series? How does all of this fit together with computational mechanics?

Recommended, more general:
• Clark Glymour, The Mind's Arrows: Bayes Nets and Graphical Causal Models in Psychology [Mini-review]
• Michael Irwin Jordan (ed.), Learning in Graphical Models
• Jordan and Sejnowski (eds.), Graphical Models [Nice collection of papers from Neural Computation]
• Judea Pearl
• "Causal Inference in Statistics: An Overview", Statistics Surveys 3 (2009): 96--146 [PDF]
• Causality: Models, Reasoning and Inference
• Peter Spirtes, Clark Glymour and Richard Scheines, Causation, Prediction, and Search [Comments]
Recommended, more specialized:
• Séverine Affeldt and Hervé Isambert, "Robust reconstruction of causal graphical models based on conditional 2-point and 3-point information", UAI 2015
• Diego Colombo, Marloes H. Maathuis, Markus Kalisch, and Thomas S. Richardson, "Learning High-Dimensional Directed Acyclic Graphs with Latent and Selection Variables", Annals of Statistics 40 (2012): 294--321, arxiv:1104.5617
• Vanessa Didelez, "Causal Reasoning for Events in Continuous Time: A Decisionâ€“Theoretic Approach", UAI 2015
• Michael Eichler and Vanessa Didelez, "Causal Reasoning in Graphical Time Series Models", UAI 2007, arxiv:1206.5246
• Antti Hyttinen, Frederick Eberhardt and Matti Järvisalo, "Do-calculus when the True Graph Is Unknown", UAI 2015
• Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, Patrik O. Hoyer, "Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity", Journal of Machine Learning Research 11 (2010): 1709--1731
• Dominik Janzing and Daniel J. L. Herrmann, "Reliable and Efficient Inference of Bayesian Networks from Sparse Data by Statistical Learning Theory", cs.LG/0309015
• Markus Kalisch and Peter Bühlmnann, "Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm", Journal of Machine Learning Research 8 (2007): 616--636
• John C. Loehlin, Latent Variable Models: An Introduction to Factor, Path, and Structural Analysis [An intro. to old-school linear latent-variable models, especially of the sort used by psychologists. Good in its own domain, but does not make enough contact with modern graphical models.]
• Marloes H. Maathuis, Diego Colombo, Markus Kalisch and Peter Bühlmann, "Predicting causal effects in large-scale systems from observational data", Nature Methods 7 (2010): 247--248 [PDF reprints of paper and supplementary information]
• Katerina Marazopoulou, Marc Maier and David Jensen, "Learning the Structure of Causal Models with Relational and Temporal Dependence", UAI 2015
• Emilija Perković, Johannes Textor, Markus Kalisch and Marloes H. Matthuis, "A Complete Generalized Adjustment Criterion", UAI 2015, arxiv:1507.01524
• Jonas Peters, Peter Bühlmann, "Structural Intervention Distance (SID) for Evaluating Causal Graphs", arxiv:1306.1043
• Christopher J. Quinn, Todd P. Coleman, Negar Kiyavash, "Causal Dependence Tree Approximations of Joint Distributions for Multiple Random Processes", arxiv:1101.5108
• Thomas Richardson
• Thomas S. Richardson and James M. Robins, "Single World Intervention Graphs (SWIGs): A Unification of the Counterfactual and Graphical Approaches to Causality" [PDF preprint, 140+ pp.; thanks to Betsy Ogburn for telling me about this]
• James Robins and Thomas Richardson, "Alternative Graphical Causal Models and the Identification of Direct Effects" [PDF preprint]
• Robert E. Tillman, Arthur Gretton and Peter Spirtes, "Nonlinear Directed Acyclic Structure Learning with Weakly Additive Noise Models", NIPS 2009 [Thanks to Prof. Spirtes for a preprint]
• Pawel Wocjan, Dominik Janzing, and Thomas Beth, "Required sample size for learning sparse Bayesian networks with many variables," cs.LG/0204052
• Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, Xenofon D. Koutsoukos, "Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification"
• Animashree Anandkumar, Kamalika Chaudhuri, Daniel Hsu, Sham M. Kakade, Le Song, Tong Zhang, "Spectral Methods for Learning Multivariate Latent Tree Structure", arxiv:1107.1283 [This sounds very much like Spearman's "tetrad equations" from 100 years ago!]
• Holly Andersen, "When to expect violations of causal faithfulness and why it matters", phil-sci/9204
• Clive G. Bowsher, "Stochastic kinetic models: Dynamic independence, modularity and graphs", Annals of Statistics 38 (2010): 2242--2281
• Facuno Bromberg and Dimitris Margaritis, "Improving the Reliability of Causal Discovery from Small Data Sets Using Argumentation", Journal of Machine Learning Research 10 (2009): 301--340
• Elias Chaibub Neto, Mark P. Keller, Alan D. Attie, and Brian S. Yandell, "Causal graphical models in systems genetics: A unified framework for joint inference of causal network and genetic architecture for correlated phenotypes", Annals of Applied Statistics 4 (2010): 320--339
• David Maxwell Chickering, "Optimal Structure Identification With Greedy Search," Journal of Machine Learning Research 3 (2002): 507--554
• David Cox and Nanny Warmuth, Multivariate Dependcencies: Models, Analysis, and Interpretation
• Luis M. de Campos, "A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests", Journal of Machine Learning Research 7 (2006): 2149--2187 [Sounds cool]
• Vanessa Didelez, "Graphical models for marked point processes based on local independence", arxiv:0710.5874
• Vanessa Didelez, Svend Kreiner and Niels Keiding, "Graphical Models for Inference Under Outcome-Dependent Sampling", Statistical Science 25 (2010): 368--387, arxiv:1101.0901
• Mathias Drton, Rina Foygel, and Seth Sullivant, "Global identifiability of linear structural equation models", Annals of Statistics 39 (2011): 865--886
• Seif Eldawlatly, Yang Zhou, Rong Jin and Karim G. Oweiss, "On the Use of Dynamic Bayesian Networks in Reconstructing Functional Neuronal Networks from Spike Train Ensembles", Neural Computation 22 (2010): 158--189
• Sergi Elizalde and Kevin Woods, "Bounds on the number of inference functions of a graphical model", math.CO/0610233
• Freedman, "On Specifying Graphical Models for Causation," UCB Stat. Tech. Rep. 601 [abstract, pdf]
• Glymour and Cooper (eds.), Computation, Causation and Discovery
• Green, Hjort and Richardson (eds.), Highly Structured Stochastic Systems
• Alain Hauser, Peter Bühlmann, "Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs", Journal of Machine Learning Research 13 (2012): 2409--2464, arxiv:1104.2808
• Yang-Bo He and Zhi Geng, "Active Learning of Causal Networks with Intervention Experiments and Optimal Designs", Journal of Machine Learning Research 9 (2008): 2523--2547
• Markus Kalisch and Peter Bühlmnann, "Robustification of the PC-Algorithm for Directed Acyclic Graphs", Journal of Computational and Graphical Statistics 17 (2008): 773--789
• Junning Li, Z. Jane Wang, "Controlling the False Discovery Rate of the Association/Causality Structure Learned with the PC Algorithm", Journal of Machine Learning Research 10 (2009): 475--514
• Sanjiang Li, "Causal models have no complete axiomatic characterization", arxiv:0804.2401
• Marloes H. Maathuis, Markus Kalisch, Peter Bühlmann, "Estimating high-dimensional intervention effects from observational data", Annals of Statistics 37 (2009): 3133--31654, arxiv:0810.4214
• Dhafer Malouche and Sylvie Sevestre-Ghalila, "Estimating High dimensional faithful Gaussian graphical Models : uPC-algorithm", arxiv:0705.1613
• Jean-Philippe Pellett and Andre Elisseeff, "Using Markov Blankets for Causal Structure Learning", Journal of Machine Learning Research 9 (2008): 1295--1342
• Sergey Plis, David Danks and Jianyu Yang, "Mesochronal Structure Learning", UAI 2015
• Philipp Rütimann and Peter Bühlmann, "High dimensional sparse covariance estimation via directed acyclic graphs", arxiv:0911.2375 = Electronic Journal of Statistics 3 (2009): 1133--1160
• Dino Sejdinovic, Arthur Gretton, Wicher Bergsma, "A Kernel Test for Three-Variable Interactions", NIPS 2013
• Bill Shipley, Cause and Correlation in Biology: A User's Guide to Path Analysis, Structural Equations and Causal Inference
• Ilya Shpitser, Judea Pearl, "Complete Identification Methods for the Causal Hierarchy", Journal of Machine Learning Research 9 (2008): 1941--1979 ["We consider a hierarchy of queries about causal relationships in graphical models, where each level in the hierarchy requires more detailed information than the one below. The hierarchy consists of three levels: associative relationships, derived from a joint distribution over the observable variables; cause-effect relationships, derived from distributions resulting from external interventions; and counterfactuals, derived from distributions that span multiple "parallel worlds" and resulting from simultaneous, possibly conflicting observations and interventions. We completely characterize cases where a given causal query can be computed from information lower in the hierarchy"]
• Ricardo Silva, Richard Scheines, Clark Glymour, Peter L. Spirtes, "Learning Measurement Models for Unobserved Variables", UAI 2003, arxiv:1212.2516
• Silva, Scheines, Glymour and Spirtes, "Learning the Structure of Linear Latent Variable Models", Journal of Machine Learning Research 7 (2006): 191--246 [open access]
• Peter Spirtes
• Achim Tresch, Florian Markowetz, "Structure Learning in Nested Effects Models", 0710.4481
• Sara van de Geer, Peter Bühlmann, "$\ell_0$-penalized maximum likelihood for sparse directed acyclic graphs", arxiv:1205.5473
• Tyler J. VanderWeele and James M. Robins
• Vivian Viallon, Onureena Banerjee, Gregoire Rey, Eric Jougla, Joel Coste, "An empirical comparative study of approximate methods for binary graphical models; application to the search of associations among causes of death in French death certificates", arxiv:1004.2287
• Xianchao Xie, Zhi Geng, "A Recursive Method for Structural Learning of Directed Acyclic Graphs", Journal of Machine Learning Research 9 (2008): 459--483
• Raanan Yehezkel, Boaz Lerner, "Bayesian Network Structure Learning by Recursive Autonomy Identification", Journal of Machine Learning Research 10 (2009): 1527--1570
• Jiji Zhang, "Causal Reasoning with Ancestral Graphs", Journal of Machine Learning Research 9 (2008): 1437--1474
• Zhang Jiji and Peter Spirtes
• "Detection of Unfaithfulness and Robust Causal Inference", phil-sc/3188
• "Strong Faithfulness and Uniform Consistency in Causal Inference", UAI 2003, arxiv:1212.2506 [I can't remember if I read this back in 2003 or not]

(Thanks to Gustavo Lacerda for pointing out a goof.)