Large Deviations

27 Sep 2015 22:41

The limit theorems of probability theory --- the weak and strong laws of large numbers, the central limit theorem, etc. --- basically say that averages taken over large samples (of well-behaved independent, identically distributed random variables) converge on expectation values. (The strong law of large numbers asserts almost-sure convergence, the central limit theorem asserts a kind of convergence in distribution, etc.) These results say little or nothing about the rate of convergence, however, which is often important for many applications of probability theory, e.g., statistical mechanics. One way to address this is the theory of large deviations. (I believe the terminology goes back to Varadhan in the 1970s, but that's just an impression, rather than research.)

Let me say things sloppily first, so the idea comes through, and then more precisely, so people who know the subject won't get too upset. Suppose $ X $ is a random variable with expected value $ \mathbf{E}[X] $, and we consider \( S_n \equiv \frac{1}{n}\sum_{i=1}^{n}{X_i} \), the sample mean of $ n $ samples of $ X $. $ S_n $ "obeys a large deviations principle" if there is a non-negative function $ r $, called the rate function, such that \[ \Pr{\left(\left| \mathbf{E}[X] - S_n \right| \geq \epsilon\right)} \rightarrow e^{-nr(\epsilon)} ~. \]
(The rate function has to obey some sensible but technical continuity conditions.) This is a large deviation result, because the difference between the empirical mean and the expectation is remaining constant as $ n $ grows --- there has to be a larger and large conspiracy, as it were, among the samples to keep deviating from the expectation in the same way. Now, one reason what I've stated isn't really enough to satisfy a mathematician is that the right-hand side converges on zero, so the functional form of the probability could be anything which also converges on zero and that'd be satisfied, but we want to pick out exponential convergence. The usual way is to look at the limiting growth rate of the probability. Also, we want the probability that the difference between the empirical mean and the expectation falls into any arbitrary set. So one usually sees the LDP asserted in some form like, for any reasonable set $ A $, \[ \lim_{n\rightarrow\infty}{-\frac{1}{n}\log{\mathrm{Pr}\left(\left| \mathbf{E}X - S_n \right| \in A\right)}} = \inf_{x\in A}{r(x)} ~. \]
(Actually, to be completely honest, I really shouldn't be assuming that there is a limit to those probabilities. Instead I should connect the lim inf of that expression to the infimum of the rate function over the interior of $ A $, and the lim sup to the infimum of the rate function over the closure of $ A $.)

Similar large deviation principles can be stated for the empirical distribution, the empirical process, functionals of sample paths, etc., rather than just the empirical mean. There are tricks for relating LDPs on higher-level objects, like the empirical distribution over trajectories, to LDPs on lower-level objects, like empirical means. (These go under names like "the contraction principle".)

Since ergodic theory extends the probabilistic limit laws to stochastic processes, rather than just sequences of independent variables, it shouldn't be surprising that large deviation principles also hold for some stochastic processes. I am particularly interested in LDPs for Markov processes, and their applications. There are further important connections to information theory, since in an awful lot of situations, the large deviations rate function is the Kullback-Leibler divergence, a.k.a. the relative entropy.

Related, but strictly speaking distinct topics:

See also: Exponential Families of Probability Measures; Maximum entropy

Previous versions: 2005-11-09 17:39 (but not the first version by any means)