Graphical Models

Last update: 21 Apr 2025 21:17
First version: 3 November 2001

That is, statistical models in which are represented by graphs or networks: random variables are nodes, and relationships of direct statistical dependence are shown as edges. I'm sticking latent-variable and path-analysis models in here, too, because they all pretty much work the same way. It's the causal modeling angle which got me interested in this originally, and which I still think is the most important, but there are many other uses.

(Some people call all of these "Bayesian graphs" or "Bayesian networks", using "Bayes" as a metonym for "conditional probability" — there are perfectly good frequentist interpretations of these models. Others draw a distinction between "Bayesian networks", with directed edges between variables, and "Markov networks", with undirected edges. The distinction between directed and undirected is important, but "Bayes" and "Markov" don't really convey it, since directed graphs have Markov properties and one uses Bayes's rule all the time in undirected graphs.)

Things I want to understand better: frequentist inference procedures. Computational learning theory for graphical models (the paper by Janzing and Herrmann is good). How to treat directed cyclic graphs? How to treat dynamical systems and time series?

Not even a conjecture. Back in the 1960s, Chow and Liu (reference below) gave a polynomial-time algorithm for finding the best approximation to a global joint probability distribution using only pairwise interactions among the variables, i.e., the one which minimized the Kullback-Leibler divergence between the true and the approximating distribution. I have read that extending this to even three-way interactions is NP-hard, though I don't know if it's NP-complete. (1) How is the intractability result established? (2) Is this the same as the computational phase transition one finds in going from 2-SAT to 3-SAT, where the critical point is at two-point-something SAT? (Presumably the answer to (1) would shed some light on this.) (3) Even if not, is there an analogous phase transition, perhaps in a different universality class? (Update in 2009, several years later: Bento and Montanari, below, sounds relevant, but I haven't read it yet.)

Factor Models
Graph Sampling Algorithms (for answering "what would a typical graphical model do?")
Machine Learning Markov Models
Spatial Statistics and Spatial Stochastic Processes
Stochastic Processes

Michael Irwin Jordan (ed.), Learning in Graphical Models
Jordan and Sejnowski (eds.), Graphical Models [Nice collection of papers from Neural Computation]
Steffen Lauritzen, Graphical Models [A fairly abstract, but quite elegant, probabilistic/mathematical-statistical treatment.]
Judea Pearl, Probabilistic Reasoning in Intelligent Systems [Though there are some ideas here about AI which seem unhelpful]

Genevera I. Allen, Zhandong Liu, "A Log-Linear Graphical Model for Inferring Genetic Networks from High-Throughput Sequencing Data", arxiv:1204.3941
Gavin Brown, Adam Pocock, Ming-Jie Zhao, Mikel Luján, "Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection", Journal of Machine Learning Research 13 (2012): 27--66
Venkat Chandrasekaran, Pablo A. Parrilo, Alan S. Willsky, "Latent variable graphical model selection via convex optimization", Annals of Statistics 40 (2012): 1935--1967, arxiv:1008.1290
C. K. Chow and C. N. Liu, "Approximating Discrete Probability Distributions with Dependence Trees", IEEE Transactions on Information Theory 14 (1968): 462--467 [An old but very nice result on how to get the optimal approximation to a global probability distribution using only pairwise interactions among the variables]
Diego Colombo, Marloes H. Maathuis, Markus Kalisch, Thomas S. Richardson, "Learning high-dimensional directed acyclic graphs with latent and selection variables", arxiv:1104.5617
David Danks, Unifying the Mind: Cognitive Representations as Graphical Models
Vanessa Didelez, "Causal Reasoning for Events in Continuous Time: A Decision–Theoretic Approach", UAI 2015
Finale Doshe-Velez, David Wingate, Joshua Tenenbaum and Nicholas Roy, "Infinite Dynamic Bayesian Networks", ICML 2011
Zubin Ghahramani, "Learning Dynamic Bayesian Networks," in Giles and Gori (eds.), Adaptive Processing of Sequences and Data Structures
Zubin Ghahramani and Michael Irwin Jordan, "Factorial Hidden Markov Models," Machine Learning 29 (1997): 245--273
Fang Han, Han Liu, Brian Caffo, "Sparse Median Graphs Estimation in a High Dimensional Semiparametric Model", arxiv:1310.3223
Dominik Janzing and Daniel J. L. Herrmann, "Reliable and Efficient Inference of Bayesian Networks from Sparse Data by Statistical Learning Theory", cs.LG/0309015
Han Liu, Fang Han, Ming Yuan, John Lafferty, and Larry Wasserman, "The Nonparanormal SKEPTIC", ICML 2012, arxiv:1206.6488
Han Liu, John Lafferty and Larry Wasserman, "The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs", Journal of Machine Learning Research 10 (2009): 2295--2328 = arxiv:0903.0649
John C. Loehlin, Latent Variable Models: An Introduction to Factor, Path, and Structural Analysis [An intro. to old-school linear latent-variable models, especially of the sort used by psychologists. Good in its own domain, but does not make enough contact with modern graphical models.]
Eric Mjolsness, "Stochastic Process Semantics for Dynamical Grammar Syntax: An Overview", cs.AI/0511073
Karthika Mohan, Judea Pearl and Jin Tian, "Graphical Models for Inference with Missing Data", NIPS 2013 [There was at least one preprint version with the more pointed title "Missing Data as a Causal Inference Problem"]
Jennifer Neville and David Jensen, "Relational Dependency Networks", Journal of Machine Learning Research 8 (2007): 653--692
Christopher J. Quinn, Todd P. Coleman, Negar Kiyavash, "Causal Dependence Tree Approximations of Joint Distributions for Multiple Random Processes", arxiv:1101.5108
Pradeep Ravikumar, Martin J. Wainwright, and John D. Lafferty, "High-dimensional Ising model selection using $\ell_1$-regularized logistic regression", Annals of Statistics 38 (2010): 1287--1319, arxiv:0804.4202
Bo Thiesson, David Maxwell Chickering, David Heckerman, Christopher Meek, "ARMA Time-Series Modeling with Graphical Models", pp. 552--560 in UAI 2004, arxiv:1207.4162
Ilya Shpitser, Karthika Mohan and Judea Pearl, "Missing Data as a Causal and Probabilistic Problem", UAI 2015
Greg Ver Steeg, Aram Galstyan, "Discovering Structure in High-Dimensional Data Through Correlation Explanation", arxiv:1406.1222
Stephen Walsh and Paul Whitney, "A Graphical Approach to Diagnosing the Validity of the Conditional Independence Assumptions of a Bayesian Network Given Data", Journal of Computational and Graphical Statistics 21 (2012): 961--978 [Cute, for small networks with discrete variables.]
Pawel Wocjan, Dominik Janzing, and Thomas Beth, "Required sample size for learning sparse Bayesian networks with many variables," cs.LG/0204052
Eric P. Xing, Michael I. Jordan, Stuart Russell, "A Generalized Mean Field Algorithm for Variational Inference in Exponential Families", UAI 2003, arxiv:1212.2512

Edoardo M. Airoldi, "Getting started in probabilistic graphical models", arxiv:0706.2040 [Tutorial aimed at biologists]
Animashree Anandkumar, Kamalika Chaudhuri, Daniel Hsu, Sham M. Kakade, Le Song, Tong Zhang, "Spectral Methods for Learning Multivariate Latent Tree Structure", arxiv:1107.1283 [This sounds very much like Spearman's "tetrad equations" from 100 years ago!]
A. Anandkumar, D. Hsu, S. M. Kakade, "Learning High-Dimensional Mixtures of Graphical Models", arxiv:1203.0697
Animashree Anandkumar, Vincent Y.F. Tan, Alan. S. Willsky, "High-Dimensional Gaussian Graphical Model Selection: Tractable Graph Families", arxiv:1107.1270
Animashree Anandkumar and Ragupathyraj Valluvan, "Learning loopy graphical models with latent variables: Efficient methods and guarantees", Annals of Statistics 41 (2013): 401--435
Erik Aurell, Magnus Ekeberg, "Inverse Ising inference using all the data", arxiv:1107.3536
Francis R. Bach and Michael I. Jordan, "Learning Graphical Models for Stationary Time Series", UCB Statistics Tech. Rep. 650
Onureena Banerjee, Laurent El Ghaoui, Alexandre d'Aspremont, "Model Selection Through Sparse Maximum Likelihood Estimation", arxiv:0707.0704
Jose Bento, Morteza Ibrahimi and Andrea Montanari, "Learning Networks of Stochastic Differential Equations", NIPS 23 (2010) [PDF]
Jose Bento, Andrea Montanari, "Which graphical models are difficult to learn?", arxiv:0910.5761
Danny Bickson, Carlos Guestrin, "Linear Characteristic Graphical Models: Representation, Inference and Applications", arxiv:1008.5325
David Brillinger, "Remarks Concerning Graphical Models for Time Series and Point Processes," Revista de Econometria 16 (1996): 1--23 [PS]
Clive G. Bowsher, "Stochastic kinetic models: Dynamic independence, modularity and graphs", Annals of Statistics 38 (2010): 2242--2281
Matthias Brocheler and Lise Getoor, "Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning", NIPS 23 (2010) [PDF]
Venkat Chandrasekaran, Pablo A. Parrilo, and Alan S. Willsky, "Latent variable graphical model selection via convex optimization", Annals of Statistics 40 (2012): 1935--1967, arxiv:1008.1290
R. Chaves, L. Luft, T. O. Maciel, D. Gross, D. Janzing, B. Schölkopf, "Inferring latent structures via information inequalities", UAI 2014, arxiv:1407.2256
Jie Cheng, Elizaveta Levina, Ji Zhu, "High-dimensional Mixed Graphical Models", arxiv:1304.2810
Michael Chertkov and Vladimir Y. Chernyak
- "Loop calculus in statistical physics and information science", Physical Review E 73 (2006): 065102 = cond-mat/0601487
- "Loop series for discrete statistical models on graphs", cond-mat/0603189
David Maxwell Chickering, "Optimal Structure Identification With Greedy Search," Journal of Machine Learning Research 3 (2002): 507--554
Robert G. Cowell, A. Philip Dawid, Steffen L. Lauritzen and David J. Spiegelhalter, Probabilistic Networks and Expert Systems
David Cox and Nanny Warmuth, Multivariate Dependcencies: Models, Analysis, and Interpretation
Mihai Cucuringu, Jesus Puente, David Shue, "Model Selection in Undirected Graphical Models with the Elastic Net", arxiv:1111.0559
Rainer Dahlhaus, "Graphical interaction models for multivariate time series," Metrika 51 (2000): 157--172
Bin Dai, Shilin Ding, Grace Wahba, "Multivariate Bernoulli Distribution", arxiv:1206.1874
Gabriel C. G. de Abreu, Rodrigo Labouriau, David Edwards, "High-dimensional Graphical Model Search with gRapHD R Package", Journal of Statistical Software 37 (2010), arxiv:0909.1234
Luis M. de Campos, "A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests", Journal of Machine Learning Research 7 (2006): 2149--2187 [Sounds damn cool]
Amir Dembo and Andrea Montanari, "Gibbs Measures and Phase Transitions on Sparse Random Graphs", arxiv:0910.5460
Amir Dembo, Andrea Montanari, Nike Sun, "Factor models on locally tree-like graphs", arxiv:1110.4821
Pedro Domingos and Daniel Lowd, Markov Logic: An Interface Layer for Artificial Intelligence
Vanessa Didelez, "Graphical models for marked point processes based on local independence", arxiv:0710.5874
Mathias Drton, Rina Foygel, and Seth Sullivant, "Global identifiability of linear structural equation models", Annals of Statistics 39 (2011): 865--886
Michael Eichler
- "Fitting Graphical Interaction Models to Multivariate Time Series", UAI 2006, arxiv:1206.6839
- "Graphical modelling of multivariate time series", math.ST/0610654
- "A note on global Markov properties for mixed graphs", arxiv:1107.3036
Seif Eldawlatly, Yang Zhou, Rong Jin and Karim G. Oweiss, "On the Use of Dynamic Bayesian Networks in Reconstructing Functional Neuronal Networks from Spike Train Ensembles", Neural Computation 22 (2010): 158--189
Gal Elidan, Iftach Nachman and Nir Friedman, "'Ideal Parent' Structure Learning for Continuous Variable Bayesian Networks", Journal of Machine Learning Research 8 (2007): 1799--1833
Sergi Elizalde and Kevin Woods, "Bounds on the number of inference functions of a graphical model", math.CO/0610233
Robin J. Evans and Thomas S. Richardson, "Markovian acyclic directed mixed graphs for discrete data", Annals of Statistics 42 (2014): 1452--1482
Richard G. Everitt, "Bayesian Parameter Estimation for Latent Markov Random Fields and Social Networks", Journal of Computational and Graphical Statistic 21 (2012): 940--960
Juan Ferrándiz, Enrique F. Castillo and Pilar Sanmartin, "Temporal aggregation in chain graph models", Journal of Statistical Planning and Inference 133 (2005): 69--93
G. David Forney, Jr., Pascal O. Vontobel, "Partition Functions of Normal Factor Graphs", arxiv:1102.0316
Frey, Graphical Models in Machine Learning and Data Communication
Roland Fried and Vanessa Didelez, "Latent variable analysis and partial correlation graphs for multivariate time series", Statistics and Probability Letters 73 (2005): 287--296
Cyril Furtlehner, Jean-Marc Lasgouttes, Arnaud De La Fortelle, "Belief Propagation and Bethe approximation for Traffic Prediction", physics/0703159
Dan Geiger, David Heckerman, Henry King, and Christopher Meek, "Stratified exponential families: Graphical models and model selection", Annals of Statistics 29 (2001): 505--529
Christophe Giraud, "Estimation of Gaussian graphs by model selection", arxiv:0710.2044
Vibhav Gogate, William Austin Webb, and Pedro Domingos, "Learning Efficient Markov Networks", NIPS 23 (2010) [PDF]
Green, Hjort and Richardson (eds.), Highly Structured Stochastic Systems
Vikas Hamine and Paul Helman, "Learning Optimal Augmented Bayes Networks", cs.LG/0509055
Patrick L. Harrington, Jr., and Alfred O. Hero III, "Spatio-Temporal Graphical Model Selection", arxiv:1004.2304
Raymond Hemmecke, Silvia Lindner, Milan Studeny, "Learning restricted Bayesian network structures", arxiv:1011.6664
Holger Höfling and Robert Tibshirani, "Estimation of Sparse Binary Pairwise Markov Networks using Pseudo-likelihoods", Journal of Machine Learning Research 10 (2009): 883--906
Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, "Stochastic Reasoning, Free Energy, and Information Geometry", Neural Computation 16 (2004): 1779--1810
Manfred Jaeger & co., Primula [Java implementation of a modeling language for relational Bayesian networks; released under GPL]
Ali Jalali, Chris Johnson, Pradeep Ravikumar, "On Learning Discrete Graphical Models Using Greedy Methods", arxiv:1107.3258
Kohler and Friedman, Graphical Models
Mladen Kolar, Eric P. Xing, "Estimating Networks With Jumps", arxiv:1012.3795
Nicole Kraemer, Juliane Schaefer, Anne-Laure Boulesteix, "Regularized estimation of large-scale gene association networks using graphical Gaussian models", arxiv:0905.0603
John Lafferty, Han Liu, Larry Wasserman, "Sparse Nonparametric Graphical Models", arxiv:1201.0794
Sophie Lèbre, "Inferring dynamic genetic networks with low order independencies", arxiv:0704.2551
Shai Litvak and Shimon Ullman, "Cortical Circuitry Implementing Graphical Models", Neural Computation 21 (2009): 3010--3056 [or rather, models of cortical circuitry implementing graphical models]
Han Liu, Xi Chen, John Lafferty, Larry Wasserman, "Graph-Valued Regression", arxiv:1006.3972
Han Liu, John Lafferty and Larry Wasserman, "Tree Density Estimation", arxiv:1001.1557
Han Liu, Kathryn Roeder, Larry Wasserman, "Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models", arxiv:1006.3316 == NIPS 23 (2010)
Han Liu, Min Xu, Haijie Gu, Anupam Gupta, John Lafferty, Larry Wasserman, "Forest Density Estimation", Journal of Machine Learning Research 12 (2011): 907--951
Stephen Luttrell, "Adaptive Cluster Expansion (ACE): A Hierarchical Bayesian Network", cs.NE/0410020
Marc Maier, Katerina Marazopoulou, David Jensen, "Reasoning about Independence in Probabilistic Models of Relational Data", arxiv:1302.4381
Giovanni M. Marchetti, Nanny Wermuth, "Matrix representations and independencies in directed acyclic graphs", arxiv:0904.0333
Lilyana Mihalkova, Lise Getoor, "Lifted Graphical Models: A Survey", arxiv:1107.4966
Eric Mjolsness, "Labeled graph notations for graphical models", UCI School of Information and Computer science Technical Report 04-03 [PDF]
Andrea Montanari, "Graphical Models Concepts in Compressed Sensing", arxiv:1011.4328
Guido Montufar, Johannes Rauh, Nihat Ay, "Maximal Information Divergence from Statistical Models defined by Neural Networks", arxiv:1303.0268
E. Mossel, S. Roch and A. Sly, "Robust Estimation of Latent Tree Graphical Models: Inferring Hidden States With Inexact Parameters", IEEE Transactions on Information Theory 59 (2013): 4357--4373
Ilias Moysidis, Bing Li, "Simultaneous Estimation of Graphical Models by Neighborhood Selection", arxiv:2206.08120
Mukund Narasimhan, Jeff A. Bilmes, "PAC-learning bounded tree-width Graphical Models", arxiv:1207.4151
Henrik Nyman, Johan Pensar, Timo Koski, Jukka Corander, "Stratified Graphical Models: Context-Specific Independence in Graphical Models", Bayesian Analysis 9 (2014): 883--908, arxiv:1309.6415
Lior Pachter and Bernd Sturmfels
- "Tropical Geometry of Statistical Models", q-bio.QM/0311009
- "Parametric Inference for Biological Sequence Analysis", q-bio.GN/0401033
Alessandro Pelizzola, "Cluster variation method in statistical physics and probabilistic graphical models", Journal of Physics A: Mathematical and General 38 (2005): R309--R339 = cond-mat/0508216
Jose M. Pena
- "Reading Dependencies from Covariance Graphs", arxiv:1010.4504
- "Every LWF and AMP chain graph originates from a set of causal models", arxiv:1312.2967
Jose M. Pena, Roland Nilsson, Johan Björkegren, Jesper Tegnér, "Identifying the Relevant Nodes Without Learning the Model", UAI 2006, arxiv:1206.6847
Tapani Raiko, Harri Valpola, Markus Harva and Juha Karhunen, "Building Blocks for Variational Bayesian Learning of Latent Variable Models", Journal of Machine Learning Research 8 (2007): 155--201
Johannes Rauh, Nihat Ay, "Robustness and Conditional Independence Ideals", arxiv:1110.1338
Marco Reale, A Graphical Modelling Approach to Time Series, Ph.D. thesis, Lancaster University, 1998 [Reale's website]
Marco Reale and Granville Tunnicliffe Wilson
- "Identification of vector AR models with recursive structural errors using conditional independence graphs"
- "The Sampling Properties of Conditional Independence Graphs for Structural Vector Autoregressions"
T. Rizzo, B. Wemmenhove, H.J. Kappen, "On Cavity Approximations for Graphical Models", cond-mat/0608312
Joshua W. Robinson, Alexander J. Hartemink, "Learning Non-Stationary Dynamic Bayesian Networks", Journal of Machine Learning Research 11 (2010): 3647--3680
Philipp Rütimann and Peter Bühlmann, "High dimensional sparse covariance estimation via directed acyclic graphs", arxiv:0911.2375 = Electronic Journal of Statistics 3 (2009): 1133--1160
Kayvan Sadeghi, Steffen L. Lauritzen, "Markov Properties for Loopless Mixed Graphs", arxiv:1109.5909
Federico Schlüter, "A survey on independence-based Markov networks learning", arxiv:1108.2283
Marco Scutari
- "Learning Bayesian Networks with the bnlearn Package", arxiv:0908.3817
- "Measures of Variability for Bayesian Network Graphical Structures", arxiv:1005.4214
Timo J. Septer, Jacob Dijkstra and Frans N. Stokman, "Detecting and measuring crucial differences between cognitive maps", Rationality and Society 24 (2012): 383--407
Tomi Silander, Teemu Roos, Petri Myllymaki, "Locally Minimax Optimal Predictive Modeling with Bayesian Networks", Journal of Machine Learning Research Workshop and Conference Proceedings 5 (AISTATS 2009): 504--511
Ricardo Silva, Robert B. Gramacy, "Gaussian Process Structural Equation Models with Latent Variables", arxiv:1002.4802 [Heard the talk at UAI 2010, but I want the details.]
Milan Studeny, Probabilistic Conditional Independence Structures ["The main topic of the monograph is a non-graphical algebraic method for describing probabilistic CI structures. However, one of the first two chapters in the book recalls and gathers basic mathematical tools for study of probabilistic conditional independence (CI) and the other one is a sketchy overview of recent advanced graphical approaches to the desciption of CI structures. The next four chapters develop the non-graphical method. The last standard chapter is an attempt to apply the method in practice: it is devoted to learning Bayesian nets and it is more mathematical (and 'philosophical') revision of some methods for learning Bayesian networks. The main aim of that chapter is to indicate that an algebraic approach can also be applied in this area."]
Charles Sutton, Andrew McCallum, "An Introduction to Conditional Random Fields", arxiv:1011.4088
Charles Sutton, Andrew McCallum and Khashayar Rohanimanesh, "Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data", Journal of Machine Learning Research 8 (2007): 693--723
Vincent Y. F. Tan, Animashree Anandkumar, Lang Tong and Alan S. Willsky, "A Large-Deviation Analysis of the Maximum-Likelihood Learning of Markov Tree Structures", IEEE Transactions on Information Theory 57 (2011): 1714--1735, arxiv:0905.0940 [Large deviations for Chow-Liu trees]
Sara van de Geer, Peter Bühlmann, "$\ell_0$-penalized maximum likelihood for sparse directed acyclic graphs", arxiv:1205.5473
Guy Van den Broeck, Kristian Kersting, Sriraam Natarajan, David Poole (eds.), ] ';An Introduction to Lifted Probabilistic Inference
Vivian Viallon, Onureena Banerjee, Gregoire Rey, Eric Jougla, Joel Coste, "An empirical comparative study of approximate methods for binary graphical models; application to the search of associations among causes of death in French death certificates", arxiv:1004.2287
Martin J. Wainwright and Michael I. Jordan, "Graphical Models, Exponential Families, and Variational Inference", Foundations and Trends in Machine Learning 1 (2008): 1--305 [PDF reprint via Prof. Jordan]
Nanny Wermuth, "Probability distributions with summary graph structure", Bernoulli 17 (2011): 845--879, arxiv:1003.3259
Nanny Wermuth, D.R. Cox, "Concepts and a case study for a flexible class of graphical Markov models", arxiv:1303.1436
David H. Wolpert, "Uncertainty relations and fluctuation theorems for Bayes nets", arxiv:1911.02700
Lingzhou Xue, Hui Zou, "Regularized rank-based estimation of high-dimensional nonparanormal graphical models", Annals of Statistics 40 (2012): 2541--2571, arxiv:1302.3082
Gui-Bo Ye, Yuanfeng Wang, Yifei Chen, Xiaohui Xie, "Efficient Latent Variable Graphical Model Selection via Split Bregman Method", arxiv:1110.3076
Jonathan S. Yedidia, William T. Freeman and Yair Weiss, "Understanding Belief Propagation and its Generalizations", Mitsubshi Electric Research Laboratories Tech. Rep. 2001-22
Marco Zaffalon and Marcus Hutter, "Robust Inference of Trees", cs.LG/0511087
Tuo Zhao, Han Liu, Kathryn Roeder, John Lafferty, Larry Wasserman, "The huge Package for High-dimensional Undirected Graph Estimation in R", Journal of Machine Learning Research 13 (2012): 1059--1062
Shuheng Zhou, John Lafferty, Larry Wasserman, "Time Varying Undirected Graphs", arxiv:0802.2758
Or Zuk, Shiri Margel, Eytan Domany, "On the Number of Samples Needed to Learn the Correct Structure of a Bayesian Network", UAI 2006, arxiv:1206.6862
Piotr Zwiernik, "An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models", Journal of Machine Learning Research 12 (2011): 3283--3310 [where "general Markov models" == "binary graphical tree models where all the inner nodes of a tree represent binary hidden variables"]