## Graphical Causal Models

*03 Aug 2015 15:43*

A species of the broader genus of graphical models, especially intended to help with problems of causal inference.

Everyone who takes basic statistics has it drilled into them that
"correlation is not causation." (When I took psych. 1, the professor said he
hoped that, if he were to come to us on our death-beds and prompt us with
"Correlation is," we would all respond "not causation.") This is a problem,
because one can infer correlation from data, and would *like* to be able
to make inferences about causation. There are typically two ways out of this.
One is to perform an experiment, preferably a randomized double-blind
experiment, to eliminate accidental sources of correlation, common causes, etc.
That's nice when you can do it, but impossible with supernovae, and not even
easy with people. The other out is to look for correlations, say that of
course they don't equal causations, and then act as if they did anyway. The
technical names for this latter course of action are "linear regression" and
"analysis of variance," and they form the core of applied quantitative social
science, e.g., The Bell Curve.

Graphical models are, in part, a way of escaping from this impasse.

The basic idea is as follows. You have a bunch of variables, and you want
to represent the causal relationships, or at least the probabilistic
dependencies, between them. You do so by means of a graph. Each node in the
graph stands for a variable. If variable A is a cause of B, then an arrow runs
from A to B. If A is a cause of B, we also say that A is one of B's
*parents,* and B one of A's *children.* If there is a causal path
from A to B, then A is an *ancestor* of B, and B is a
*descendant* of A. If a variable has no parents in the graph, it is
*exogenous,* otherwise it is *endogenous.*

Part of what we mean by "cause" is that, when we know the immediate causes, the remoter causes are irrelevant --- given the parents, remoter ancestors don't matter. The standard example is that applying a flame to a piece of cotton will cause it to burn, whether the flame came from a match, spark, lighter or what-not. Probabilistically, this is a conditional indepedence property, or a Markov property: a variable is independent of its ancestors conditional on its parents. In fact, given its parents, its children, and its childrens' other parents, a variable is conditionally independent of all other variables. This is called the graphical or causal Markov property. When this holds, we can factor the joint probability distribution for all the variables into the product of the distribution of the exogenous variables, and the conditional distribution for each endogenous variable given its parents.

(You may be wondering what happens if A is a parent of B and B is a parent of A, as can happen when there is feedback between the variables. This leads to difficulties, traditionally dealt with by explicitly limiting the discussion to acyclic graphs. I shall follow this wise precedent here.)

Now, there are certain rules which let us infer conditional independence
relations from each other. For instance, if X is independent of the
combination of Y and W, given Z, then X is indepdent of Y alone given Z. So,
if we have a graph which obeys the causal Markov condition, there are generally
other conditional independence relations which follow from the basic ones. If
these are the only conditional indepences which hold in the distribution, it is
said to be *faithful* to the graph (or vice versa); otherwise it is
unfaithful. For a graph to be Markov and unfaithful, there must (as it were)
be an elaborate conspiracy among the conditional distributions, so elaborate
that it will generally be destroyed by any change in any of those
distributions. So faithfulness is a robust property.

This may sound pretty arcane, but that's just because it *is* arcane.
The point, however, is that if you can make the three assumptions above (no
causal cycles, Markov property, faithfulness), you're in business in a really
remarkable way. There are very powerful statistical techniques that will let
you infer the causal structure connecting your variables. This comes in two
flavors. One is the Bayesian way: cook up a prior distribution over all
possible causal graphs; compute the likelihood of the data under each graph;
update your distribution over graphs; iterate. This is generally
computationally intractable, assuming you can come up with a meaningful prior
in the first place. The other approach is to use tests for conditional
independence to eliminate possible connections between variables, and so to
narrow down the range of candidate structures; it is basically frequentist, and
can be shown, under a broad range of circumstances, to be asymptotically
reliable.

Once you have your causal graph --- whether through estimation or through simply being handed one --- you can do lots of great things with it, like predict the effects of manipulating some of the variables, or make backward inferences from effects to causes. Of course, if the graph is big, doing the necessary calculations can be very troublesome in itself, and so people work on approximation methods and even ways of doing statistical inference on models of statistical distributions...

It's probably obvious I think this is incredibly neat, and even one of the
most important ideas to come out
of machine learning. Of course
it doesn't *really* solve the problem of establishing causal relations,
in the way Hume objected to; it says, assuming there
are causal relations, of a certain stochastic form, and that these are stable,
then they can be learned. But that, and the more general questions of what we
ought to mean by "cause", deserve a notebook of their
own.

Things I want to understand better: Structure discovery: the PC algorithm and extensions. (Once you know the graph, parameter estimation, and even nonparametric estimation, is, by construction, straightforward.) Statistical learning theory for graphical models. (The paper by Janzing and Herrmann is a good start). How to treat systems with feedback? How to treat dynamical systems and time series? How does all of this fit together with computational mechanics?

- Recommended, more general:
- Clark Glymour, The Mind's Arrows: Bayes Nets and Graphical Causal Models in Psychology [Mini-review]
- Michael Irwin Jordan (ed.), Learning in Graphical Models
- Jordan and Sejnowski (eds.), Graphical Models [Nice collection of papers from Neural Computation]
- Judea Pearl
- "Causal Inference in Statistics: An Overview", forthcoming
in Statistics Surveys
**3**(2009): 96--146 [PDF] - Causality: Models, Reasoning and Inference

- "Causal Inference in Statistics: An Overview", forthcoming
in Statistics Surveys
- Peter Spirtes, Clark Glymour and Richard Scheines, Causation, Prediction, and Search [Comments]

- Recommended, more specialized:
- Séverine Affeldt and Hervé Isambert, "Robust reconstruction of causal graphical models based on conditional 2-point and 3-point information", UAI 2015
- Diego Colombo, Marloes H. Maathuis, Markus Kalisch, and Thomas S. Richardson, "Learning High-Dimensional Directed Acyclic Graphs with Latent and Selection Variables", Annals of Statistics
**40**(2012): 294--321, arxiv:1104.5617 - Vanessa Didelez, "Causal Reasoning for Events in Continuous Time: A Decisionâ€“Theoretic Approach", UAI 2015
- Michael Eichler and Vanessa Didelez, "Causal Reasoning in Graphical Time Series Models", UAI 2007, arxiv:1206.5246
- Antti Hyttinen, Frederick Eberhardt and Matti Järvisalo, "Do-calculus when the True Graph Is Unknown", UAI 2015
- Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, Patrik O. Hoyer, "Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity", Journal of Machine Learning Research
**11**(2010): 1709--1731 - Dominik Janzing and Daniel J. L. Herrmann, "Reliable and Efficient Inference of Bayesian Networks from Sparse Data by Statistical Learning Theory", cs.LG/0309015
- Markus Kalisch and Peter Bühlmnann, "Estimating
High-Dimensional Directed Acyclic Graphs with the
PC-Algorithm", Journal
of Machine Learning Research
**8**(2007): 616--636 - John C. Loehlin, Latent Variable Models: An Introduction to Factor, Path, and Structural Analysis [An intro. to old-school linear latent-variable models, especially of the sort used by psychologists. Good in its own domain, but does not make enough contact with modern graphical models.]
- Marloes H. Maathuis, Diego Colombo, Markus Kalisch and Peter Bühlmann, "Predicting causal effects in large-scale systems from observational data", Nature Methods
**7**(2010): 247--248 [PDF reprints of paper and supplementary information] - Katerina Marazopoulou, Marc Maier and David Jensen, "Learning the Structure of Causal Models with Relational and Temporal Dependence", UAI 2015
- Jonas Peters, Peter Bühlmann, "Structural Intervention Distance (SID) for Evaluating Causal Graphs", arxiv:1306.1043
- Christopher J. Quinn, Todd P. Coleman, Negar Kiyavash, "Causal Dependence Tree Approximations of Joint Distributions for Multiple Random Processes", arxiv:1101.5108
- Thomas Richardson
- "A Discovery Algorithm for Directed Cyclic Graphs", UAI 1996 [Short version without details]
- "A Discovery Algorithm for Directed Cyclic Graphs", Technical Report 68, CMU department of philosophy (postscript) [Long version with details]

- Thomas S. Richardson and James M. Robins, "Single World Intervention Graphs (SWIGs): A Unification of the Counterfactual and Graphical Approaches to Causality" [PDF preprint, 140+ pp.; thanks to Betsy Ogburn for telling me about this]
- James Robins and Thomas Richardson, "Alternative Graphical Causal Models and the Identification of Direct Effects" [PDF preprint]
- Robert E. Tillman, Arthur Gretton and Peter Spirtes, "Nonlinear Directed Acyclic Structure Learning with Weakly Additive Noise Models", NIPS 2009 [Thanks to Prof. Spirtes for a preprint]
- Pawel Wocjan, Dominik Janzing, and Thomas Beth, "Required sample size for learning sparse Bayesian networks with many variables," cs.LG/0204052

- Modesty forbids me to recommend:
- CRS, Advanced Data Analysis from an Elementary Point of View, Part III (chapters on causal inference for statistics students)
- CRS and Andrew C. Thomas, "Homophily and Contagion Are Generically Confounded in Observational Social Network Studies", arxiv:1004.4704 [Less-technical weblog version]

- To read:
- Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, Xenofon D. Koutsoukos, "Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification"
- "Part I: Algorithms and Empirical Evaluation", Journal of Machine Learning Research
**11**(2010): 171--234 - "Part II: Analysis and Extensions", Journal of Machine Learning Research
**11**(2010): 235--284

- "Part I: Algorithms and Empirical Evaluation", Journal of Machine Learning Research
- Animashree Anandkumar, Kamalika Chaudhuri, Daniel Hsu, Sham M. Kakade, Le Song, Tong Zhang, "Spectral Methods for Learning Multivariate Latent Tree Structure", arxiv:1107.1283 [This sounds very much like Spearman's "tetrad equations" from 100 years ago!]
- Holly Andersen, "When to expect violations of causal faithfulness and why it matters", phil-sci/9204
- Clive G. Bowsher, "Stochastic kinetic models: Dynamic independence,
modularity and
graphs", Annals
of Statistics
**38**(2010): 2242--2281 - Facuno Bromberg and Dimitris Margaritis, "Improving the Reliability
of Causal Discovery from Small Data Sets Using Argumentation",
Journal of Machine Learning Research
**10**(2009): 301--340 - Elias Chaibub Neto, Mark P. Keller, Alan D. Attie, and Brian S. Yandell, "Causal graphical models in systems genetics: A unified framework for joint inference of causal network and genetic architecture for correlated phenotypes", Annals of Applied Statistics
**4**(2010): 320--339 - David Maxwell Chickering, "Optimal Structure Identification
With Greedy Search," Journal of Machine Learning Research
**3**(2002): 507--554 - David Cox and Nanny Warmuth, Multivariate Dependcencies: Models, Analysis, and Interpretation
- Luis M. de Campos, "A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests",
Journal of
Machine Learning Research
**7**(2006): 2149--2187 [Sounds cool] - Vanessa Didelez, "Graphical models for marked point processes based on local independence", arxiv:0710.5874
- Vanessa Didelez, Svend Kreiner and Niels Keiding, "Graphical Models
for Inference Under Outcome-Dependent
Sampling", Statistical
Science
**25**(2010): 368--387, arxiv:1101.0901 - Mathias Drton, Rina Foygel, and Seth Sullivant, "Global identifiability of linear structural equation models", Annals of
Statistics
**39**(2011): 865--886 - Seif Eldawlatly, Yang Zhou, Rong Jin
and Karim G. Oweiss, "On the Use of Dynamic Bayesian Networks in Reconstructing Functional Neuronal Networks from Spike Train Ensembles", Neural Computation
**22**(2010): 158--189 - Sergi Elizalde and Kevin Woods, "Bounds on the number of inference functions of a graphical model", math.CO/0610233
- Freedman, "On Specifying Graphical Models for Causation," UCB Stat. Tech. Rep. 601 [abstract, pdf]
- Glymour and Cooper (eds.), Computation, Causation and Discovery
- Green, Hjort and Richardson (eds.), Highly Structured Stochastic Systems
- Alain Hauser, Peter Bühlmann, "Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs",
Journal of Machine Learning Research
**13**(2012): 2409--2464, arxiv:1104.2808 - Yang-Bo He and Zhi Geng, "Active Learning of Causal
Networks with Intervention Experiments and Optimal Designs",
Journal of
Machine Learning Research
**9**(2008): 2523--2547 - Markus Kalisch and Peter Bühlmnann, "Robustification of the PC-Algorithm for Directed Acyclic Graphs", Journal of Computational and
Graphical Statistics
**17**(2008): 773--789 - Junning Li, Z. Jane Wang, "Controlling the False Discovery Rate of the Association/Causality Structure Learned with the PC Algorithm", Journal of Machine
Learning Research
**10**(2009): 475--514 - Sanjiang Li, "Causal models have no complete axiomatic characterization", arxiv:0804.2401
- Marloes H. Maathuis, Markus Kalisch, Peter Bühlmann, "Estimating high-dimensional intervention effects from observational data", Annals
of Statistics
**37**(2009): 3133--31654, arxiv:0810.4214 - Dhafer Malouche and Sylvie Sevestre-Ghalila, "Estimating High dimensional faithful Gaussian graphical Models : uPC-algorithm", arxiv:0705.1613
- Jean-Philippe Pellett and Andre Elisseeff, "Using Markov Blankets for Causal Structure Learning", Journal of Machine Learning
Research
**9**(2008): 1295--1342 - Emilija Perković, Johannes Textor, Markus Kalisch and Marloes H. Matthuis, "A Complete Generalized Adjustment Criterion"
- Sergey Plis, David Danks and Jianyu Yang, "Mesochronal Structure Learning", UAI 2015
- Philipp Rütimann and Peter Bühlmann, "High
dimensional sparse covariance estimation via directed acyclic graphs",
arxiv:0911.2375 = Electronic
Journal of Statistics
**3**(2009): 1133--1160 - Dino Sejdinovic, Arthur Gretton, Wicher Bergsma, "A Kernel Test for Three-Variable Interactions", NIPS 2013
- Bill Shipley, Cause and Correlation in Biology: A User's Guide to Path Analysis, Structural Equations and Causal Inference
- Ilya Shpitser, Judea Pearl, "Complete Identification Methods for
the Causal
Hierarchy", Journal of
Machine Learning Research
**9**(2008): 1941--1979 ["We consider a hierarchy of queries about causal relationships in graphical models, where each level in the hierarchy requires more detailed information than the one below. The hierarchy consists of three levels: associative relationships, derived from a joint distribution over the observable variables; cause-effect relationships, derived from distributions resulting from external interventions; and counterfactuals, derived from distributions that span multiple "parallel worlds" and resulting from simultaneous, possibly conflicting observations and interventions. We completely characterize cases where a given causal query can be computed from information lower in the hierarchy"] - Ricardo Silva, Richard Scheines, Clark Glymour, Peter L. Spirtes, "Learning Measurement Models for Unobserved Variables", UAI 2003, arxiv:1212.2516
- Silva, Scheines, Glymour and Spirtes, "Learning the Structure
of Linear Latent Variable Models", Journal of Machine Learning Research
**7**(2006): 191--246 [open access] - Peter Spirtes
- "Graphical models, causal inference, and
econometric models", Journal of Economic Methodology
**12**(2005): 1--33 [PDF] - "Introduction to Causal Inference", Journal
of Machine Learning Research
**11**(2010): 1643--1662

- "Graphical models, causal inference, and
econometric models", Journal of Economic Methodology
- Achim Tresch, Florian Markowetz, "Structure Learning in Nested Effects Models", 0710.4481
- Sara van de Geer, Peter Bühlmann, "$\ell_0$-penalized maximum likelihood for sparse directed acyclic graphs", arxiv:1205.5473
- Tyler J. VanderWeele and James M. Robins
- "Minimal sufficient causation and directed acyclic graphs", Annals of Statistics
**37**(2009): 1437--1465 - "Properties of Monotonic Effects on Directed
Acyclic Graphs", Journal of Machine Learning
Research
**10**(2009): 699--718 - "Signed directed acyclic graphs for causal inference",
Journal of the Royal Statistical Society B
**72**(2010): 111--127

- "Minimal sufficient causation and directed acyclic graphs", Annals of Statistics
- Vivian Viallon, Onureena Banerjee, Gregoire Rey, Eric Jougla, Joel Coste, "An empirical comparative study of approximate methods for binary graphical models; application to the search of associations among causes of death in French death certificates", arxiv:1004.2287
- Xianchao Xie, Zhi Geng, "A Recursive Method for Structural Learning
of Directed Acyclic Graphs", Journal of
Machine Learning Research
**9**(2008): 459--483 - Raanan Yehezkel, Boaz Lerner, "Bayesian Network Structure Learning by Recursive Autonomy Identification", Journal of
Machine Learning Research
**10**(2009): 1527--1570 - Jiji Zhang, "Causal Reasoning with Ancestral Graphs",
Journal of
Machine Learning Research
**9**(2008): 1437--1474 - Zhang Jiji and Peter Spirtes
- "Detection of Unfaithfulness and Robust Causal Inference", phil-sc/3188
- "Strong Faithfulness and Uniform Consistency in Causal Inference", UAI 2003, arxiv:1212.2506 [I can't remember if I read this back in 2003 or not]

(Thanks to Gustavo Lacerda for pointing out a goof.)