## Exponential Families of Probability Measures

*27 Feb 2017 16:30*

I should explain what these area, but having done so elsewhere, I am feeling disinclined to do it again. (Later, I should just copy that text.)

I am particularly interested in exponential families for time series (very natural for Markov models) and for networks. More generally, if I have a family of stochastic processes (collections of dependent random variables) which form exponential families, what constraints does that put on the process?

Exponential families correspond to canonical ensembles in statistical mechanics. (Natural sufficient statistics : natural parameters :: extensive macroscopic variables : conjugate intensive variables.) In statistical mechanics, one of the justifications for using canonical ensembles for large systems comes from large deviations theory. Is there something equivalent in statistics proper? (Roussas's results on local asymptotic approximation of parametric models by exponential families feels like it should be connected here.)

See also: Information geometry; Large deviations; Maximum entropy; Statistical mechanics; Statistics in general; Sufficient statistics

- Recommended, big picture:
- Lawrence D. Brown, Fundamentals of Statistical Exponential Families: with Applications in Statistical Decision Theory [All you ever wanted to know, really. Now open access.]
- Benoit Mandelbrot, "The Role of Sufficiency and of Estimation in
Thermodynamics", Annals
of Mathematical Statistics
**33**(1962): 1021--1038 [Still one of the best discussions of the interplay between formal, statistical and substantive motivations for exponential families.] - Mark
Schervish's Theory
of Statistics [Exponential families are central enough to
statistical theory that any good textbook will have decent coverage of the same
key topics, but I found Mark's treatment particularly clear and
streamlined
*before*he became my department chair.]

- Recommended, close-ups:
- Peter Guttorp, Stochastic Modeling of Scientific Data [Gives nice discussions and examples of using exponential families, and their properties, to model dependent data]
- Rudolf Kulhavy, Recursive Nonlinear Estimation: A Geometric Approach [Emphasizing information-geometric aspects]
- Steffen L. Lauritzen
- Extremal Families and Systems of Sufficient Statistics [Mini-review.]
- "Extreme Point Models in Statistics",
Scandinavian Journal of Statistics
**11**(1984): 65--91 [Highlights of the book, without proofs but with decent typography. Includes some of his very interesting algebraic extensions to the usual notions of exponential families. JSTOR]

- George G. Roussas, Contiguity of Probability Measures: Some Applications in Statistics [Asymptotic theory of exponential-family approximation, estimation and testing, for discrete-time Markov processes on fairly general state-spaces. Mini-review]
- Eric P. Xing, Michael I. Jordan, Stuart Russell, "A Generalized Mean Field Algorithm for Variational Inference in Exponential Families", UAI 2003, arxiv:1212.2512
- Lin Yuan, Sergey Kirshner, Robert Givan, "Estimating Densities with Non-Parametric Exponential Families", arxiv:1206.5036

- Modesty forbids me to recommend:
- CRS and Alessandro
Rinaldo, "Consistency under Sampling of Exponential Random Graph
Models", Annals of
Statistics
**forthcoming**, arxiv:1111.3054 [Our results are actually about exponential families of stochastic processes in general, though inspired by and applied to puzzles arising from the ERGM situation]

- To read:
- O. E. Barndorff-Nielsen, Information and Exponential Families
- Andrew R. Barron and Chyong-Hwa Sheu, "Approximation of Density
Functions by Sequences of Exponential
Families", Annals of
Statistics
**19**(1991): 1347--1369 - Imre Csiszar and Frantisek Matus, "Closures of exponential
families", Annals of
Probability
**33**(2005): 582--600 = math.PR/0503653 - J. L. Denny, "Sufficient Conditions for a Family of Probabilities
to be
Exponential", Proceedings
of the National Academy of Sciences
**57**(1967): 1184-- ["We make the following statement precise under fairly weak conditions: in an experiment, if we summarize \( n \) statistically independent observations \( (x_1, \ldots x_n ) \) in \( m < n \) real numbers \( (y_1, \ldots y_m) \), where \( y_j = \sum_{i=1}^{n}{f_j(x_i)} \) and the \( f_j \) are given functions, and if we assume we have lost no information by the summary, then the family of probabilities associated with the experiment must be an exponential family."] - Mark L. Huber, "Approximation algorithms for the normalizing constant of Gibbs distributions", arxiv:1206.2689
- Sham Kakade, Ohad Shamir, Karthik Sindharan, Ambuj Tewari, "Learning Exponential Families in High-Dimensions: Strong Convexity and Sparsity", Journal of Machine Learning Research Proceedings
**9**(2010): 381--388 - Uwe Küchler and Michael Sørensen, Exponential Families of Stochastic Processes
- Qiang Liu, Jian Peng, Alexander Ihler and John Fisher III, "Estimating the Partition Function by Discriminance Sampling", UAI 2015
- Richard Lockhart and Federico O'Reilly, "A note on Moore's
conjecture", Statistics and
Probability Letters
**74**(2005): 212--220 ["We establish the conjecture of Moore ... that the usual plug-in estimate of a distribution function and the Rao-Blackwell estimate of the distribution function are asymptotically equivalent for a wide class of exponential family distributions."] - Frank Nielsen, "Chernoff information of exponential families", arxiv:1102.2684
- Saisuke Okabayashi and Charles J. Geyer, "Long range search for
maximum likelihood in exponential
families", Electronic
Journal of Statistics
**6**(2012): 123--147 - Johannes Rauh, "Optimally approximating exponential families", arxiv:1111.0483
- Vincent Rivoirard, Judith Rousseau, "Posterior Concentration Rates
for Infinite Dimensional Exponential
Families", Bayesian
Analysis
**7**(2012): 311--334 - Martin J. Wainwright and Michael I. Jordan, "Graphical Models,
Exponential Families, and Variational Inference", Foundations and Trends in Machine Learning
**1**(2008): 1--305 [PDF reprint via Prof. Jordan]

- To write:
- CRS, "Exponential Families of Stochastic Automata and Their Mixtures"
- CRS + Co-conspirators to be named later, "Projective Structure and Parametric Inference in Exponential Families"