## Concentration of Measure

*24 May 2019 14:39*

Roughly speaking, one says that a probability measure is "concentrated" if
any set of positive probability can be expanded very slightly to
contain *most* of the probability. A classic example (which I learned
in statistical mechanics) is the uniform
distribution on the interior of unit spheres in high dimensions. A little bit
of calculus shows that, as the number of dimensions grows, the ratio of the
surface area to the volume does too, so a shell of thin, constant thickness on
the interior of the sphere ends up containing almost all of its volume, and
overlaps heavily with any set of positive probability. Equivalently, all
well-behaved functional of a concentrated measure are very close to their
expectation values with very high probability. (This is equivalent because one
can specify a probability measure either through giving the probabilities of
sets in the sigma-field, or by giving the expectation values of a sufficiently
rich set of test functions, e.g., all bounded and continuous functions. [At
least, bounded and continuous functions are enough for the weak topology on
measures, but if you understand that qualification then you know what I'm
saying anyway.])

In applications to probability, one is typically concerned with "concentration bounds", which are statements of the order of "for samples of size \( n \), the probability that any function \( f \) in a well-behaved class \( \mathcal{F} \) differs from its expectation value by \( \epsilon \) or more is at most \( C e^{-nr(\epsilon)} \)", with explicit values of the prefactor \( C \) and the rate function \( r \); these might depend on the true distribution and on the function class \( \mathcal{F} \), but not on the specific function \( f \).

These finite-sample upper bounds are to be distinguished from asymptotic results giving matching upper and lower bounds (large deviations theory proper). They are also distinct from deviation inequalities which may depend on the specific function \( f \), though obviously finding such deviation inequalities is often a first step to proving concentration-of-measure results.

(Most of what I know about this subject I've learned from my former student Aryeh Kontorovich, but don't blame him for my mis-apprehensions.)

See also: Empirical Process Theory; Learning Theory; Model Selection; Statistical Mechanics

- Recommended, big picture:
- Stéphane Boucheron, Gábor Lugosi, and Pascal Massart, Concentration Inequalities: A Nonasymptotic Theory of Independence [Mini-review]
- Andreas Maurer, "Thermodynamics and Concentration",
Bernoulli
**18**(2012): 434--454, arxiv:1205.1595[Deriving concentration bounds from statistical-mechanical arguments; very nice.] - Maxim Raginsky, Igal Sason, Foundations and Trends in Communications and Information
Theory
**10**(2013): 1--246, "Concentration of Measure Inequalities in Information Theory, Communications and Coding", arxiv:1212.4663 [Mini-review]

- Recommended, close-ups:
- Sergey Bobkov, Mokshay Madiman, "Concentration of the information in data with log-concave distributions", Annals of Probability
**39**(2011): 1528--1543, arxiv:1012.5457 - Sourav Chatterjee and Partha S. Dey, "Applications of Stein's method for concentration inequalities", Annals of Probability
**38**(2010): 2443--2485, arxiv:0906.1034 - J.-R. Chazottes, P. Collet, C. Kuelske, and F. Redig,
"Concentration inequalities for random fields via coupling",
Probability Theory and Related Fields
**137**(2007): 201--225, math.PR/0503483 - Leonid (Aryeh) Kontorovich
[Comments]
- "Obtaining Measure Concentration from Markov Contraction", arxiv:0711.0987 [I really don't know why Leo bothered to take my class]
- "Measure Concentration of Hidden Markov Processes", math.PR/0608064
- "Measure concentration of Markov Tree Processes", math.PR/0608511
- "Concentration in unbounded metric spaces and algorithmic stability", arxiv:1309.1007

- Aryeh Kontorovich and Anthony Brockwell, "A Strong Law of Large Numbers for Strongly Mixing Processes", arxiv:0807.4665
- Aryeh Kontorovich and Maxim Raginsky, "Concentration of measure without independence: a unified approach via the martingale method", arxiv:1602.0072
- Leonid Kontorovich and Kavita Ramanan, "Concentration Inequalities for Dependent Random Variables via the Martingale Method", Annals of Probability
**36**(2008): 2126--2158, arxiv:math/0609835 - Pascal Massart, Concentration Inequalities and Model Selection [Using empirical process theory to get finite-sample, i.e., non-asymptotic, risk bounds for various forms of model selection. Available for free as a large PDF preprint.]
- V. Maume-Deschamps, "Concentration Inequalities and Estimation of Conditional Probabilities" [PDF]

- To read:
- Sylvain Arlot, Gilles Blanchard, and Etienne Roquain, "Some nonasymptotic results on resampling in high dimension, I: Confidence regions", Annals of Statistics
**38**(2010): 51--82 - Alexander Barovnik, lecture notes on Measure Concentration
- Denis Belomestny, Vladimir Spokoiny, "Concentration inequalities for smooth random fields", arxiv:1307.1565
- S.G. Bobkov and F. Götze, "Concentration of empirical distribution functions with applications to non-i.i.d. models",
Bernoulli
**16**(2010): 1385--1414, arxiv:1011.6165 - Stéphane Boucheron, Gábor Lugosi, and Pascal Massart, "Concentration inequalities using the entropy method", Annals
of Probability
**31**(2003): 1583--1614 - Stéphane Boucheron, Maud Thomas, "Concentration inequalities for order statistics", arxiv:1207.7209
- Louis H. Y. Chen, Xiao Fang, "Multivariate Normal Approximation by Stein's Method: The Concentration Inequality Approach", arxiv:1111.4073
- Pierre Del Moral, Peng Hu and Liming Wu, "On the Concentration Properties of Interacting Particle Processes", Foundations and Trends in Machine Learning
**3**(2012): 225--389, arxiv:1107.1948 - Evarist Giné, Vladimir Koltchinskii, Wenbo Li, Joel Zinn, "High Dimensional Probability", arxiv:math/0612726
- Nathael Gozlan, Christian Léonard, "Transport Inequalities. A Survey", arxiv:1003.3852
- Arnaud Guillin, Aldéric Joulin, "Measure concentration through non-Lipschitz observables and functional inequalities", arxiv:1202.2341
- Russell Impagliazzo, Valentine Kabanets, "Constructive Proofs of Concentration Bounds", Electronic Colloquium on Computational Complexity TR10-072 [From my superficial and biased inspection, the most impenetrably non-probabilistic proof of a probabilistic result I have ever seen. But potentially interesting.]
- Can M. Le, Roman Vershynin, "Concentration and regularization of random graphs", arxiv:1506.00669
- Ledoux, The Concentration of Measure Phenomenon
- Johannes C. Lederer, Sara A. van de Geer, "New Concentration Inequalities for Suprema of Empirical Processes", arxiv:1111.3486
- Cesar Maldonado, "Fluctuation Bounds for Chaos Plus Noise in Dynamical Systems", Journal of Statistical Physics
**148**(2012): 548--564 - K. Marton, "Bounding $\bar{d}$-distance by informational
divergence: a method to prove measure
concentration", Annals
of Probability
**24**(1996): 857--866 [Thanks to Leo Kontorovich for the pointer] - Elizabeth Meckes, "Quantitative asymptotics of graphical projection pursuit", arxiv:0811.2769
- Daniel Paulin, "Concentration inequalities for Markov chains by Marton couplings", arxiv:1212.2015
- Iosif Pinelis, "Optimal re-centering bounds, with applications to Rosenthal-type concentration of measure inequalities", arxiv:1111.2622
- Nathan Ross, "Fundamentals of Stein's method", arxiv:1109.1880
- Timothy John Sullivan, Houman Owhadi, "Equivalence of concentration inequalities for linear and non-linear functions", arxiv:1009.4913
- Talagrand, The Generic Chaining
- Joel A. Tropp, "An Introduction to Matrix Concentration Inequalities", arxiv:1501.01571
- Ramon van Handel, Probability in High Dimension [PDF lecture notes]
- Roman Vershynin, High-Dimensional Probability: An Introduction with Applications in Data Science