## Sufficient Statistics

*05 Aug 2019 12:28*

In statistical theory, a "statistic" is a well-behaved (i.e., "measurable")
function of the data, which is what's actually used in calculations or
inferences, rather than the full data set. E.g., the sample mean, the sample
median, the sample variance, etc. A statistic is *sufficient* if it is
just as informative as the full data. The concept was introduced by
R. A. Fisher in the 1920s, and refined by Jerzy Neyman in the 1930s.
Parametric sufficiency means that the statistic contains just as much
information about (some) parameter of the model as the full data. More
precisely: the actual data has a certain probability distribution conditional
on the data, which in general will also involve the parameter. The statistic
is sufficient if this conditional distribution is the *same* for all
parameter values. (That's actually clearer in algebra but I don't feel up to
writing it in HTML now.) Once we've controlled for the sufficient statistic,
nothing else --- not even the original data --- can tell us anything more about
the parameter. Predictive sufficiency is similar: given the predictively
sufficient statistic, future observations can be predicted as well as if the
whole past was available. Predictive sufficiency can be expressed concisely in
terms of mutual information.

A *necessary* statistic is one which can be computed from any
sufficient statistic, without reference to the original data. (It's
"necessary" in the sense that any optimal inference implicitly involves knowing
the necessary statistic.) Under pretty general conditions, maximum likelihood
estimates are necessary statistics, though they are not always sufficient. A
*minimal sufficient* statistic is one which is both necessary and
sufficient --- i.e., it's just as informative as the original data, but it can
be computed from any other sufficient statistic; no further compression of the
data is possible, without losing some information.

A lot of my work has involved describing and finding predictively sufficient statistics for time series and spatio-temporal processes. It turns out that the statistical sufficiency property gives rise to a Markov property for the statistics. (Basically, computational mechanics turns out to be about constructive predictively sufficient statistics.) So I'm very interested in sufficiency in general, and especially how it relates to Markovian representations of non-Markovian processes.

Topics of particular interest: Necessary and sufficient conditions for the existence of non-trivial sufficient statistics; dimensionality of sufficient statistics; geometric and probabilistic characterizations; decision-theoretic properties; necessary statistics; minimal sufficient statistics for transducers; connections to causal inference; relationship between sufficiency and ergodic theory; characterization of different classes of stochastic processes in terms of their sufficient statistics; exponential families.

- Recommended, big picture:
- Sufficiency is a very important topic in statistical inference, and any good book on theoretical statistics will cover it in depth. I like Mark Schervish's Theory of Statistics, but really any one will do.
- Persi Diaconis, "Sufficency as Statistical Symmetry", Proceedings of the AMS Centennial Symposium 15--26 [1988; PDF]
- E. B. Dynkin, "Sufficient statistics and extreme
points", Annals of Probability
**6**(1978): 705--730 ["The connection between ergodic decompositions and sufficient statistics is explored in an elegant paper by DYNKIN" --- Kallenberg, Foundations of Modern Probability, p. 577.]

- Recommended, close ups:
- R. R. Bahadur, "Sufficiency and statistical decision functions,"
Annals of Mathematical Statistics
**25**(1954): 423--462 - David Blackwell and M. A. Girshick, Theory of Games and Statistical Decisions [Blackwell was a pioneer in exploring the decision-theoretic properties of sufficiency, and this excellent old book contains many deep theorems in this area]
- Ronald W. Butler, "Predictive Likelihood Inference with
Applications", Journal of the Royal Statistical Society
B
**48**(1986): 1--38 ["in the predictive setting, all parameters are nuisance parameters". JSTOR] - John W. Fisher III, Alexander T. Ihler and Paula A. Viola, "Learning Informative Statistics: A Nonparametric Approach", pp. 900--906 in NIPS 12 (1999) [PDF reprint. I'd call this more of a semi-parametric approach than a fully non-parametric one; they assume a parametric form for the dependence structure, but are agnostic about the distributions of innovations, and so try to maximize non-parametrically estimated mutual informations. In the limit, this will give them sufficient statistics.]
- R. A. Fisher
- "A Mathematical Examination of the Methods of Determining the Accuracy of an Observation by the Mean Error, and by the Mean Square Error",
Monthly Notices of the Royal Astronomical
Society
**80**(1920): 758--770 [Apparently the first time the sufficiency property was noted, though Fisher does not use that term here. PDF] - "On the Mathematical Foundations of Theoretical Statistics",
Philosophical Transactions of the Royal
Society A
**222**(1922): 309--368 [Formal introduction of the concept, and the name, of sufficiency, along with much else that has proved fundamental to statistics, such as the likelihood function and the method of maximum likelihood. PDF in two parts, 1, 2] - "Theory of Statistical Estimation", Proceedings of
the Cambridge Philosophical Society
**22**(1925): 700--725 [Often, but mistakenly, cited in place of the 1922 paper; admittedly, clearer. PDF]

- "A Mathematical Examination of the Methods of Determining the Accuracy of an Observation by the Mean Error, and by the Mean Square Error",
Monthly Notices of the Royal Astronomical
Society
- Solomon Kullback, Information Theory and Statistics
- Solomon Kullback and R. A. Leibler, "On Information and
Sufficiency",
Annals of Mathematical Statistics
**22**(1951): 79--86 - Rudolf Kulhavy, Recursive Nonlinear Estimation: A Geometric Approach
- Steffen L. Lauritzen
- Extremal Families and Systems of Sufficient Statistics [Mini-review.]
- "Extreme Point Models in Statistics",
Scandinavian Journal of Statistics
**11**(1984): 65--91 [Highlights of the book, without proofs but with decent typography. With useful discussion and a reply. JSTOR] - "Sufficiency, Prediction and Extreme Models",
Scandinavian Journal of Statistics
**1**(1974): 128--134 [JSTOR] - "On the Interrelationships among Sufficiency, Total Sufficiency, and Some Related Concepts", Preprint 8, Institute of Mathematical Statistics, University of Copenhagen (July 1974) [PDF scan via Prof. Lauritzen]

- Benoit Mandelbrot, "The Role of Sufficiency and of Estimation in
Thermodynamics", Annals
of Mathematical Statistics
**33**(1962): 1021--1038 [Extensive thermodynamic variables as sufficient statistics for the conjugate intensive variables; Gibbs canonical form arising from natural requirements on finite-dimensional sufficient statistics, which can only be achieved for exponential families of probability distributions. Very clever, and IMHO a real contribution to the foundations of statistical mechanics and thermodynamics.] - Giorgio Picci, "Some Connections Between the Theory of Sufficient Statistics and the Identifiability Problem", SIAM Journal on Applied
Mathematics
**33**(1977): 383--398 [Introduces the idea of a "maximal identifiable statistic" --- the coarsest partition of hypothesis space where each equivalence class/cell of the partition gives rise to a*distinct*distribution of observables. (I would prefer "parameter", rather than "statistic", since it's a function of the distribution, not the observables, but that's a quibble.) It might be interesting to try to define emergence in these terms --- perhaps as a restriction on the observable sigma-field such that the equivalence classes of the maximal identifiable parameter become infinite-dimensional, or something like that. JSTOR. Thanks to Rhiannon Weaver for the pointer.] - David Pollard, "A note on insufficiency and the preservation of Fisher information", arxiv:1107.3797
- Ge Xu, Biao Chen, "The Sufficiency Principle for Decentralized Data Reduction", arxiv:1207.3265

- To read:
- Nihat Ay, Jürgen Jost, Hông Vân Lê, Lorenz Schwachhüfer, "Information geometry and sufficient statistics", arxiv:1207.6736
- M. S. Bartlett
- T. Bohlin, "Information pattern for linear discrete-time models
with stochastic coefficients," IEEE Transactions on Automatic
Control
**15**(1970): 104--106 [On recursively-computable sufficient statistics] - R. Dennis Cook, Liliana Forzani, and Adam J. Rothman, "Estimating sufficient reductions of the predictors in abundant high-dimensional regressions", Annals of Statistics
**40**(2012): 353--384 - E. B. Dynkin, "Necessary and sufficient statistics for a family of
probability distributions," Uspekhi maetm. nauk
**6**(1951): 68--90 [Apparently translated in Select. Trans. Math. Statist. Prob.**1**(1951): 23--41. Zacks, below, is supposed to follow closely] - David Hinkley, "Predictive Likelihood", Annals of Statistics
**7**(1979): 718--728 - V. S. Huzurbazar, Sufficient Statistics: Selected Contributions
- Anna Jencova and Denes Petz, "Suffificiency in quantum statistical inference", math-ph/0412093
- Kuang-Yao Lee, Bing Li, and Francesca Chiaromonte, "A general theory for nonlinear sufficient dimension reduction: Formulation and estimation",
Annals of Statistics
**41**(2013): 221--249, arxiv:1304.0580 - Yanyuan Ma and Liping Zhu, "Efficient estimation in sufficient dimension reduction", Annals of Statistics
**41**(2013): 250--268 - W. J. Runggaldier and F. Spizzichino, "Sufficient conditions for
finite dimensionality of filters in discrete time: A Laplace transform-based
approach," Bernoulli
**7**(2001): 211--221 - Morris Skibinsky, "Adequate Subfields and Sufficiency",
Annals of Mathematical Statistics
**38**(1967): 155--161 - Taiji Suzuki and Masashi Sugiyama, "Sufficient Dimension Reduction via Squared-Loss Mutual Information Estimation", Neural Computation
**25**(2013): 725--758 - Andrew Tausz, "Properties of Conditional Expectation Operators and Sufficient Subfields", arxiv:1011.5162
- Brendan van Rooyen, Robert C. Williamson, "Le Cam meets LeCun: Deficiency and Generic Feature Learning", arxiv:1402.4884
- Tao Wang, Xu Guo, Peirong Xu, Lixing Zhu, "Transformed sufficient dimension reduction", arxiv:1401.0267
- Makoto Yamada, Gang Niu, Jun Takagi, Masashi Sugiyama, "Sufficient Component Analysis for Supervised Dimension Reduction", arxiv:1103.4998
- S. Zacks, The Theory of Statistical Inference [For material on necessary and sufficient statistics]