Notebooks

## Large Deviations

15 Aug 2019 14:41

The limit theorems of probability theory --- the weak and strong laws of large numbers, the central limit theorem, etc. --- basically say that averages taken over large samples (of well-behaved independent, identically distributed random variables) converge on expectation values. (The strong law of large numbers asserts almost-sure convergence, the central limit theorem asserts a kind of convergence in distribution, etc.) These results say little or nothing about the rate of convergence, however, which is often important for many applications of probability theory, e.g., statistical mechanics. One way to address this is the theory of large deviations. (I believe the terminology goes back to Varadhan in the 1970s, but that's just an impression, rather than research.)

Let me say things sloppily first, so the idea comes through, and then more precisely, so people who know the subject won't get too upset. Suppose $X$ is a random variable with expected value $\mathbf{E}[X]$, and we consider $S_n \equiv \frac{1}{n}\sum_{i=1}^{n}{X_i}$, the sample mean of $n$ samples of $X$. $S_n$ "obeys a large deviations principle" if there is a non-negative function $r$, called the rate function, such that $\Pr{\left(\left| \mathbf{E}[X] - S_n \right| \geq \epsilon\right)} \rightarrow e^{-nr(\epsilon)} ~.$
(The rate function has to obey some sensible but technical continuity conditions.) This is a large deviation result, because the difference between the empirical mean and the expectation is remaining constant as $n$ grows --- there has to be a larger and large conspiracy, as it were, among the samples to keep deviating from the expectation in the same way. Now, one reason what I've stated isn't really enough to satisfy a mathematician is that the right-hand side converges on zero, so the functional form of the probability could be anything which also converges on zero and that'd be satisfied, but we want to pick out exponential convergence. The usual way is to look at the limiting growth rate of the probability. Also, we want the probability that the difference between the empirical mean and the expectation falls into any arbitrary set. So one usually sees the LDP asserted in some form like, for any reasonable set $A$, $\lim_{n\rightarrow\infty}{-\frac{1}{n}\log{\mathrm{Pr}\left(\left| \mathbf{E}X - S_n \right| \in A\right)}} = \inf_{x\in A}{r(x)} ~.$
(Actually, to be completely honest, I really shouldn't be assuming that there is a limit to those probabilities. Instead I should connect the lim inf of that expression to the infimum of the rate function over the interior of $A$, and the lim sup to the infimum of the rate function over the closure of $A$.)

Similar large deviation principles can be stated for the empirical distribution, the empirical process, functionals of sample paths, etc., rather than just the empirical mean. There are tricks for relating LDPs on higher-level objects, like the empirical distribution over trajectories, to LDPs on lower-level objects, like empirical means. (These go under names like "the contraction principle".)

Since ergodic theory extends the probabilistic limit laws to stochastic processes, rather than just sequences of independent variables, it shouldn't be surprising that large deviation principles also hold for some stochastic processes. I am particularly interested in LDPs for Markov processes, and their applications. There are further important connections to information theory, since in an awful lot of situations, the large deviations rate function is the Kullback-Leibler divergence, a.k.a. the relative entropy.

Related, but strictly speaking distinct topics:

• Finite-sample deviation inequalities, such as the Bernstein, Chernoff and Hoeffding inequalities, which bound the probability of averages departing by more than a certain amount from expectation values at given finite sample sizes;
• Concentration of measure, roughly speaking upper bounds on deviation probabilities holding uniformly over large classes of functions. (Note that large deviations principles have match upper and lower bounds, and need only hold asymptotically.)
Recommended, big picture:
• James Bucklew, Large Deviation Techniques in Decision, Simulation, and Estimation
• Thomas Cover and Joy Thomas, Elements of Information Theory [Very nice chapter on large deviations for IID sequences]
• Amir Dembo and Ofer Zeitouni, Large Deviations Techniques and Applications [Chapters 2, 4 and 5, and parts of chapter 6, are available in postscript format via Prof. Dembo's page for his course on large deviations]
• Frank den Hollander, Large Deviations [Nice introductory text for people with an applied probability background. Short.]
• Richard S. Ellis
• "The Theory of Large Deviations: from Boltzmann's 1877 Calculation to Equilibrium Macrostates in 2D Turbulence", Physica D 133 (1999): 106--136
• Entropy, Large Deviations, and Statistical Mechanics
• M. I. Friedlin and A. D. Wentzell, Random Perturbations of Dynamical Systems
• Hugo Touchette, "The Large Deviations Approach to Statistical Mechanics", Physics Reports 478 (2009): 1--69, arxiv:0804.0327
• S. R. S. Varadhan, "Large Deviations", Annals of Probability 36 (2008): 397--419 [Copy via Prof. Varadhan. Wald Lecture for 2005.]
Recommended, close-ups:
• R. R. Bahadur, Some Limit Theorems in Statistics [1971. The notation is now much more transparent, and the proofs of many basic theorems considerably simplified. But if there's a better source for statistical applications than this little book, I've yet to find it.]
• Julien Barré, Freddy Bouchet, Thierry Dauxois and Stefano Ruffo, "Large deviation techniques applied to systems with long-range interactions", cond-mat/0406358 = Journal of Statistics Physics 119 (2005): 677--713
• Michel Benaïn and Jörgen W. Weibull, "Deterministic Approximation of Stochastic Evolution in Games", Econometrica 71 (2003): 879--903 [JSTOR]
• Daniel Berend, Peter Harremoës, Aryeh Kontorovich, "Minimum KL-divergence on complements of L1 balls", arxiv:1206.6544
• Christian Borgs, Jennifer Chayes and David Gamarnik, "Convergent sequences of sparse graphs: A large deviations approach", arxiv:1302.4615 [See under graph limits]
• Arijit Chakrabarty, "Effect of truncation on large deviations for heavy-tailed random vectors", arxiv:1107.2476
• Sourav Chatterjee and S. R. S. Varadhan, "The large deviation principle for the Erdos-Renyi random graph", arxiv:1008.1946
• J.-R. Chazottes and D. Gabrielli, "Large deviations for empirical entropies of Gibbsian sources", math.PR/0406083 = Nonlinearity 18 (2005): 2545--2563 [This is a very cool result which shows that block entropies, and entropy rates estimated from those blocks, obey the large deviation principle even as one lets the length of the blocks grow with the amount of data, provided the block-length doesn't grow too quickly (only logarithmically). I wish I could write papers like this.]
• W. De Roeck, Christian Maes and Karel Netocny, "H-Theorems from Autonomous Equations", cond-mat/0508089 [this basically derives the H-theorem of statistical mechanics as a large deviations result, assuming a certain reasonable Markovian form for the macroscopic dynamics. In fact, we have a separate argument that you don't have that Markovian form, you're just not trying hard enough; see here]
• Paul Dupuis, "Large Deviations Analysis of Some Recursive Algorithms with State-Dependent Noise", Annals of Probability 16 (1988): 1509--1536 [Open access]
• Gregory L. Eyink
• "Action principle in nonequilbrium statistical dynamics," Physical Review E 54 (1996): 3419--3435 [Least action as a consequence of Markovian LDP]
• "A Variational Formulation of Optimal Nonlinear Estimation," physics/0011049 [Nice connections between optimal state estimation (assuming a known form for the underlying stochastic process), nonequilibrium statistical mechanics, and large deviations theory, leading to tractable-looking numerical schemes for estimation.]
• Jin Feng and Thomas G. Kurtz, Large Deviations for Stochastic Processes [Online]
• R. L. Kautz
• C. M. Newman, J. E. Cohen and C. Kipnis, "Neo-darwinian evolution implies punctuated equilibria", Nature 315 (1985): 400--401
• Steven Orey and Stephan Peliken, "Large deviations principles for stationary processes", Annals of Probability 16 (1988): 1481--1495
• Eric Smith, "Large-deviation principles, stochastic effective actions, path entropies, and the structure and meaning of thermodynamic descriptions", arxiv:1102.3938
• Eric Smith and Supriya Krishnamurthy, "Symmetry and Collective Fluctuations in Evolutionary Games", SFI Working Paper 11-03-010
• Hugo Touchette, "Asymptotic equivalence of probability measures and stochastic processes", arxiv:1708.02890
To write:
• CRS, "Large Deviations in Exponential Families of Stochastic Automata"

Previous versions: 2005-11-09 17:39 (but not the first version by any means)