Notebooks

Sufficient Statistics

Last update: 08 Dec 2024 00:59
First version:

In statistical theory, a "statistic" is a well-behaved (i.e., "measurable") function of the data, which is what's actually used in calculations or inferences, rather than the full data set. E.g., the sample mean, the sample median, the sample variance, etc. A statistic is sufficient if it is just as informative as the full data. The concept was introduced by R. A. Fisher in the 1920s, and refined by Jerzy Neyman in the 1930s. Parametric sufficiency means that the statistic contains just as much information about (some) parameter of the model as the full data. More precisely: the actual data has a certain probability distribution conditional on the data, which in general will also involve the parameter. The statistic is sufficient if this conditional distribution is the same for all parameter values. (That's actually clearer in algebra but I don't feel up to writing it in HTML now.) Once we've controlled for the sufficient statistic, nothing else --- not even the original data --- can tell us anything more about the parameter. Predictive sufficiency is similar: given the predictively sufficient statistic, future observations can be predicted as well as if the whole past was available. Predictive sufficiency can be expressed concisely in terms of mutual information.

A necessary statistic is one which can be computed from any sufficient statistic, without reference to the original data. (It's "necessary" in the sense that any optimal inference implicitly involves knowing the necessary statistic.) Under pretty general conditions, maximum likelihood estimates are necessary statistics, though they are not always sufficient. A minimal sufficient statistic is one which is both necessary and sufficient --- i.e., it's just as informative as the original data, but it can be computed from any other sufficient statistic; no further compression of the data is possible, without losing some information.

A lot of my work has involved describing and finding predictively sufficient statistics for time series and spatio-temporal processes. It turns out that the statistical sufficiency property gives rise to a Markov property for the statistics. (Basically, computational mechanics turns out to be about constructive predictively sufficient statistics.) So I'm very interested in sufficiency in general, and especially how it relates to Markovian representations of non-Markovian processes.

Topics of particular interest: Necessary and sufficient conditions for the existence of non-trivial sufficient statistics; dimensionality of sufficient statistics; geometric and probabilistic characterizations; decision-theoretic properties; necessary statistics; minimal sufficient statistics for transducers; connections to causal inference; relationship between sufficiency and ergodic theory; characterization of different classes of stochastic processes in terms of their sufficient statistics; exponential families.


Notebooks: