Notebooks
http://bactra.org/notebooks
Cosma's NotebooksenGraphical Causal Models
http://bactra.org/notebooks/2021/07/09#graphical-causal-models
<P>A species of the broader genus of <a href="graphical-models.html">graphical
models</a>, especially intended to help with problems
of <a href="causal-inference.html">causal inference</a>.
<P>Everyone who takes basic statistics has it drilled into them that
"correlation is not causation." (When I took psych. 1, the professor said he
hoped that, if he were to come to us on our death-beds and prompt us with
"Correlation is," we would all respond "not causation.") This is a problem,
because one can infer correlation from data, and would <em>like</em> to be able
to make inferences about causation. There are typically two ways out of this.
One is to perform an experiment, preferably a randomized double-blind
experiment, to eliminate accidental sources of correlation, common causes, etc.
That's nice when you can do it, but impossible with supernovae, and not even
easy with people. The other out is to look for correlations, say that of
course they don't equal causations, and then act as if they did anyway. The
technical names for this latter course of action are "linear regression" and
"analysis of variance," and they form the core of applied quantitative social
science, e.g., <cite>The Bell Curve.</cite>
<P>Graphical models are, in part, a way of escaping from this impasse.
<P>The basic idea is as follows. You have a bunch of variables, and you want
to represent the causal relationships, or at least the probabilistic
dependencies, between them. You do so by means of a graph. Each node in the
graph stands for a variable. If variable A is a cause of B, then an arrow runs
from A to B. If A is a cause of B, we also say that A is one of B's
<em>parents,</em> and B one of A's <em>children.</em> If there is a causal path
from A to B, then A is an <em>ancestor</em> of B, and B is a
<em>descendant</em> of A. If a variable has no parents in the graph, it is
<em>exogenous,</em> otherwise it is <em>endogenous.</em>
<P>Part of what we mean by "cause" is that, when we know the immediate causes,
the remoter causes are irrelevant --- given the parents, remoter ancestors
don't matter. The standard example is that applying a flame to a piece of
cotton will cause it to burn, whether the flame came from a match, spark,
lighter or what-not. Probabilistically, this is a conditional indepedence
property, or a Markov property: a variable is independent of its ancestors
conditional on its parents. In fact, given its parents, its children, and its
childrens' other parents, a variable is conditionally independent of all other
variables. This is called the graphical or causal Markov property. When this
holds, we can factor the joint probability distribution for all the variables
into the product of the distribution of the exogenous variables, and the
conditional distribution for each endogenous variable given its parents.
<P>(You may be wondering what happens if A is a parent of B and B is a parent
of A, as can happen when there is feedback between the variables. This leads
to difficulties, traditionally dealt with by explicitly limiting the discussion
to acyclic graphs. I shall follow this wise precedent here.)
<P>Now, there are certain rules which let us infer conditional independence
relations from each other. For instance, if X is independent of the
combination of Y and W, given Z, then X is indepdent of Y alone given Z. So,
if we have a graph which obeys the causal Markov condition, there are generally
other conditional independence relations which follow from the basic ones. If
these are the only conditional indepences which hold in the distribution, it is
said to be <em>faithful</em> to the graph (or vice versa); otherwise it is
unfaithful. For a graph to be Markov and unfaithful, there must (as it were)
be an elaborate conspiracy among the conditional distributions, so elaborate
that it will generally be destroyed by any change in any of those
distributions. So faithfulness is a robust property.
<P>This may sound pretty arcane, but that's just because it <em>is</em> arcane.
The point, however, is that if you can make the three assumptions above (no
causal cycles, Markov property, faithfulness), you're in business in a really
remarkable way. There are very powerful statistical techniques that will let
you infer the causal structure connecting your variables. This comes in two
flavors. One is the Bayesian way: cook up a prior distribution over all
possible causal graphs; compute the likelihood of the data under each graph;
update your distribution over graphs; iterate. This is generally
computationally intractable, assuming you can come up with a meaningful prior
in the first place. The other approach is to use tests for conditional
independence to eliminate possible connections between variables, and so to
narrow down the range of candidate structures; it is basically frequentist, and
can be shown, under a broad range of circumstances, to be asymptotically
reliable.
<P>Once you have your causal graph --- whether through estimation or through
simply being handed one --- you can do lots of great things with it, like
predict the effects of manipulating some of the variables, or make backward
inferences from effects to causes. Of course, if the graph is big, doing the
necessary calculations can be very troublesome in itself, and so people work on
approximation methods and even ways of doing statistical inference on models of
statistical distributions...
<P>It's probably obvious I think this is incredibly neat, and even one of the
most important ideas to come out
of <a href="learning-inference-induction.html">machine learning</a>. Of course
it doesn't <em>really</em> solve the problem of establishing causal relations,
in the way <a href="hume.html">Hume</a> objected to; it says, assuming there
are causal relations, of a certain stochastic form, and that these are stable,
then they can be learned. But that, and the more general questions of what we
ought to mean by "cause", deserve a <a href="causality.html">notebook of their
own</a>.
<P>Things I want to understand better: Structure discovery. (Once you know the
graph, parameter estimation, and even nonparametric estimation, is, by
construction, straightforward.) <a href="learning-theory.html">Statistical
learning theory</a> for graphical models. (The paper by Janzing and Herrmann
is a good start). How to treat systems with feedback? How to
treat <a href="chaos.html">dynamical systems</a>
and <a href="time-series.html">time series</a>? How does all of this fit
together with <a href="computational-mechanics.html">computational
mechanics</a>?
<P>--- Causal discovery algorithms are of enough interest to me that
they <a href="causal-discovery-algorithms.html">deserve their own notebook</a>,
which should take a lot of material from this one, but for now that's mostly
just a reference list.
<ul>Recommended, more general:
<li>Clark Glymour, <cite>The Mind's Arrows: Bayes Nets and Graphical
Causal Models in Psychology</cite>
[<a href="../weblog/algae-2006-07.html#glymour-arrows">Mini-review</a>]
<li><a href="http://www.cs.berkeley.edu/~jordan/">Michael Irwin
Jordan</a> (ed.), <cite>Learning in Graphical Models</cite>
<li>Jordan and Sejnowski (eds.), <cite>Graphical Models</cite> [Nice
collection of papers from <cite>Neural Computation</cite>]
<li>Judea Pearl
<ul>
<li>"Causal Inference in Statistics: An Overview",
<cite>Statistics Surveys</cite> <strong>3</strong> (2009): 96--146
[<a href="http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf">PDF</a>]
<li><cite>Causality: Models, Reasoning and
Inference</cite>
</ul>
<li>Peter Spirtes, Clark Glymour and Richard Scheines, <cite>Causation,
Prediction, and Search</cite> [<a href="../weblog/algae-2009-12.html#SGS">Comments</a>]
</ul>
<ul>Recommended, more specialized:
<li>Vanessa Didelez, "Causal Reasoning for Events in Continuous Time: A Decisionâ€“Theoretic Approach", <a href="http://www.homepages.ucl.ac.uk/~ucgtrbd/uai2015_causal/papers/didelez.pdf">UAI 2015</a>
<li>Michael Eichler and Vanessa Didelez, "Causal Reasoning in Graphical Time Series Models", UAI 2007, <a href="http://arxiv.org/abs/1206.5246">arxiv:1206.5246</a>
<li>Antti Hyttinen, Frederick Eberhardt and Matti Järvisalo,
"Do-calculus when the True Graph Is Unknown", <a href="http://auai.org/uai2015/proceedings/papers/127.pdf">UAI 2015</a>
<li>Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, Patrik O. Hoyer, "Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity", <a href="http://jmlr.csail.mit.edu/papers/v11/hyvarinen10a.html"><cite>Journal of Machine Learning Research</cite> <strong>11</strong>
(2010): 1709--1731</a>
<li>John C. Loehlin, <cite>Latent Variable Models: An Introduction to
Factor, Path, and Structural Analysis</cite> [An intro. to old-school linear
latent-variable models, especially of the sort used by psychologists. Good in
its own domain, but does not make enough contact with modern graphical models.]
<li>Marloes H. Maathuis, Diego Colombo, Markus Kalisch and Peter Bühlmann, "Predicting causal effects in large-scale systems from observational data", <a href="http://dx.doi.org/10.1038/nmeth0410-247"><cite>Nature Methods</cite> <strong>7</strong> (2010): 247--248</a> [PDF reprints of <a href="http://www.ccspmd.ethz.ch/news/publications/Buehlmann_10">paper</a> and <a href="http://stat.ethz.ch/Manuscripts/buhlmann/maathuisetal2010SI.pdf">supplementary information</a>]
<li>Marloes H. Maathuis, Markus Kalisch, Peter Bühlmann, "Estimating high-dimensional intervention effects from observational data", <cite>Annals
of Statistics</cite> <strong>37</strong> (2009): 3133--31654, <a href="http://arxiv.org/abs/0810.4214">arxiv:0810.4214</a>
<li>Emilija Perković, Johannes Textor, Markus Kalisch and
Marloes H. Matthuis, "A Complete Generalized Adjustment Criterion",
<a href="http://auai.org/uai2015/proceedings/papers/155.pdf">UAI 2015</a>, <a href="http://arxiv.org/abs/1507.01524">arxiv:1507.01524</a>
<li>Jonas Peters, Peter Bühlmann, "Structural Intervention Distance (SID) for Evaluating Causal Graphs", <a href="http://arxiv.org/abs/1306.1043">arxiv:1306.1043</a>
<li>Christopher J. Quinn, Todd P. Coleman, Negar Kiyavash, "Causal Dependence Tree Approximations of Joint Distributions for Multiple Random Processes", <a href="http://arxiv.org/abs/1101.5108">arxiv:1101.5108</a>
<li>Christopher J. Quinn, Negar Kiyavash, Todd P. Coleman, "Directed Information Graphs", <a href="http://arxiv.org/abs/1204.2003">arxiv:1204.2003</a>
<li>Thomas S. Richardson and James M. Robins, "Single World
Intervention Graphs (SWIGs): A Unification of the Counterfactual and Graphical
Approaches to Causality"
[<a href="http://www.csss.washington.edu/Papers/wp128.pdf">PDF preprint</a>,
140+ pp.; thanks to Betsy Ogburn for telling me about this]
<li>James Robins and Thomas Richardson, "Alternative Graphical Causal Models and the Identification of Direct Effects" [<a href="http://www.biostat.harvard.edu/robins/publications/wp100.pdf">PDF preprint</a>]
</ul>
<ul>Modesty forbids me to recommend:
<li>CRS, <cite><a href="http://www.stat.cmu.edu/~cshalizi/uADA/">Advanced Data Analysis from an Elementary Point of View</a></cite>, Part III (chapters on causal inference for statistics students)
<li>CRS and Andrew C. Thomas, "Homophily and Contagion Are Generically
Confounded in Observational Social Network
Studies", <a href="http://arxiv.org/abs/1004.4704">arxiv:1004.4704</a>
[<a href="http://bactra.org/weblog/656.html">Less-technical weblog version</a>]
<li>Octavio Mesner, Alex Davis, Elizabeth Casman, Hyagriv Simhan, CRS, Lauren Keenan-Devlin, Ann Borders and Tamar Krishnamurt, "Using graph learning to understand adverse pregnancy outcomes and stress pathways", <a href="https://doi.org/10.1371/journal.pone.0223319"><cite>PLoS One</cite> <strong>14</strong> (2019): e0223319</a>
</ul>
<ul>To read:
<li>Animashree Anandkumar, Kamalika Chaudhuri, Daniel Hsu, Sham M. Kakade, Le Song, Tong Zhang, "Spectral Methods for Learning Multivariate Latent Tree Structure", <a href="http://arxiv.org/abs/1107.1283">arxiv:1107.1283</a>
[This sounds very much like Spearman's "tetrad equations" from 100 years ago!]
<li>Holly Andersen, "When to expect violations of causal faithfulness and why it matters", <a href="http://philsci-archive.pitt.edu/9204/">phil-sci/9204</a>
<li>Clive G. Bowsher, "Stochastic kinetic models: Dynamic independence,
modularity and
graphs", <a href="http://projecteuclid.org/euclid.aos/1278861248"><cite>Annals
of Statistics</cite>
<strong>38</strong> (2010): 2242--2281</a>
<li>Elias Chaibub Neto, Mark P. Keller, Alan D. Attie, and Brian S. Yandell, "Causal graphical models in systems genetics: A unified framework for joint inference of causal network and genetic architecture for correlated phenotypes", <a href="http://projecteuclid.org/euclid.aoas/1273584457"><cite>Annals of Applied Statistics</cite> <strong>4</strong>
(2010): 320--339</a>
<li>David Cox and Nanny Warmuth, <cite>Multivariate Dependcencies: Models, Analysis, and Interpretation</cite>
<li>Vanessa Didelez, "Graphical models for marked point processes based on local independence", <a href="http://arxiv.org/abs/0710.5874">arxiv:0710.5874</a>
<li>Vanessa Didelez, Svend Kreiner and Niels Keiding, "Graphical Models
for Inference Under Outcome-Dependent
Sampling", <a href="http://projecteuclid.org/euclid.ss/1294167965"><cite>Statistical
Science</cite> <strong>25</strong> (2010): 368--387</a>, <a href="http://arxiv.org/abs/1101.0901">arxiv:1101.0901</a>
<li>Mathias Drton, Rina Foygel, and Seth Sullivant, "Global identifiability of linear structural equation models", <a href="http://projecteuclid.org/euclid.aos/1299680957"><cite>Annals of
Statistics</cite> <strong>39</strong> (2011): 865--886</a>
<li>Seif Eldawlatly, Yang Zhou, Rong Jin
and Karim G. Oweiss, "On the Use of Dynamic Bayesian Networks in Reconstructing Functional Neuronal Networks from Spike Train Ensembles", <a href="http://dx.doi.org/"><cite>Neural Computation</cite> <strong>22</strong> (2010): 158--189</a>
<li>Sergi Elizalde and Kevin Woods, "Bounds on the number of inference
functions of a graphical
model", <a href="http://arxiv.org/abs/math.CO/0610233">math.CO/0610233</a>
<li>Freedman, "On Specifying Graphical Models for Causation,"
UCB Stat. Tech. Rep. 601 [<a
href="http://www.stat.berkeley.edu/tech-reports/601.abstract">abstract</a>, <a
href="http://www.stat.berkeley.edu/~census/601.pdf">pdf</a>]
<li>Green, Hjort and Richardson (eds.), <Cite>Highly Structured
Stochastic Systems</cite>
<li>Sanjiang Li, "Causal models have no complete axiomatic
characterization", <a href="http://arxiv.org/abs/0804.2401">arxiv:0804.2401</a>
<li>Philipp Rütimann and Peter Bühlmann, "High
dimensional sparse covariance estimation via directed acyclic graphs",
<a href="http://arxiv.org/abs/0911.2375">arxiv:0911.2375</a> = <a href="http://projecteuclid.org/euclid.ejs/1259677088"><cite>Electronic
Journal of Statistics</cite> <strong>3</strong> (2009): 1133--1160</a>
<li>Dino Sejdinovic, Arthur Gretton, Wicher Bergsma, "A Kernel Test for Three-Variable Interactions", <a href="http://papers.nips.cc/paper/4893-a-kernel-test-for-three-variable-interactions">NIPS 2013</a>
<li>Bill Shipley, <cite>Cause and Correlation in Biology: A User's
Guide to Path Analysis, Structural Equations and Causal Inference</cite>
<li>Ilya Shpitser, Judea Pearl, "Complete Identification Methods for
the Causal
Hierarchy", <a
href="http://jmlr.csail.mit.edu/papers/v9/shpitser08a.html"><cite>Journal of
Machine Learning Research</cite> <strong>9</strong> (2008): 1941--1979</a> ["We
consider a hierarchy of queries about causal relationships in graphical models,
where each level in the hierarchy requires more detailed information than the
one below. The hierarchy consists of three levels: associative relationships,
derived from a joint distribution over the observable variables; cause-effect
relationships, derived from distributions resulting from external
interventions; and counterfactuals, derived from distributions that span
multiple "parallel worlds" and resulting from simultaneous, possibly
conflicting observations and interventions. We completely characterize cases
where a given causal query can be computed from information lower in the
hierarchy"]
<li>Ricardo Silva, Richard Scheines, Clark Glymour, Peter L. Spirtes, "Learning Measurement Models for Unobserved Variables", UAI 2003, <a href="http://arxiv.org/abs/1212.2516">arxiv:1212.2516</a>
<li>Peter Spirtes
<ul>
<li>"Graphical models, causal inference, and
econometric models", <cite>Journal of Economic Methodology</citE> <strong>12</strong> (2005): 1--33 [<a href="http://www.hss.cmu.edu/philosophy/spirtes/jem05.pdf">PDF</a>]
<li>"Introduction to Causal Inference", <a href="http://jmlr.csail.mit.edu/papers/v11/spirtes10a.html"><cite>Journal
of Machine Learning Research</cite> <strong>11</strong> (2010): 1643--1662</a>
<li>"Variable Definition and Causal
Inference",
Proceedings of the 13th International Congress of Logic Methodology and Philosophy of Science, pp. 514--53 <a href="https://www.cmu.edu/dietrich/philosophy/docs/spirtes/lmps13.doc">PDF reprint via Prof. Spirtes</a>]
</ul>
<li>Achim Tresch, Florian Markowetz, "Structure Learning in Nested
Effects Models", <a href="http://arxiv.org/abs/0710.4481">0710.4481</a>
<li>Sara van de Geer, "On the uniform convergence of empirical norms and inner products, with application to causal inference", <a href="http://arxiv.org/abs/1310.5523">arxiv:1310.5523</a>
<li>Tyler J. VanderWeele and James M. Robins
<ul>
<li>"Minimal sufficient causation and directed acyclic graphs", <a href="http://dx.doi.org/10.1214/08-AOS613"><cite>Annals of Statistics</cite> <strong>37</strong> (2009): 1437--1465</a>
<li>"Properties of Monotonic Effects on Directed
Acyclic Graphs", <a href="http://jmlr.csail.mit.edu/papers/v10/vanderweele09a.html"><citE>Journal of Machine Learning
Research</citE> <strong>10</strong> (2009): 699--718</a>
<li>"Signed directed acyclic graphs for causal inference",
<a href="http://dx.doi.org/10.1111/j.1467-9868.2009.00728.x"><cite>Journal of the Royal Statistical Society</cite> B <strong>72</strong>
(2010): 111--127</a>
</ul>
<li>Vivian Viallon, Onureena Banerjee, Gregoire Rey, Eric Jougla, Joel Coste, "An empirical comparative study of approximate methods for binary graphical models; application to the search of associations among causes of death in French death certificates", <a href="http://arxiv.org/abs/1004.2287">arxiv:1004.2287</a>
<LI>Jiji Zhang, "Causal Reasoning with Ancestral Graphs",
<a href="http://jmlr.csail.mit.edu/papers/v9/zhang08a.html"><cite>Journal of
Machine Learning Research</cite>
<strong>9</strong> (2008): 1437--1474</a>
<li>Zhang Jiji and Peter Spirtes
<ul>
<li>"Detection of Unfaithfulness and Robust Causal
Inference", <a href="http://philsci-archive.pitt.edu/archive/00003188/">phil-sc/3188</a>
<li>"Strong Faithfulness and Uniform Consistency in Causal Inference", UAI 2003, <a href="http://arxiv.org/abs/1212.2506">arxiv:1212.2506</a> [I can't remember if I read this back in 2003 or not]
</ul>
</ul>
<P>(Thanks to Gustavo Lacerda for pointing out a goof.)