## Concentration of Measure

*08 Dec 2023 12:17*

Roughly speaking, one says that a probability measure is "concentrated" if
any set of positive probability can be expanded very slightly to
contain *most* of the probability. A classic example (which I learned
in statistical mechanics) is the uniform
distribution on the interior of unit spheres in high dimensions. A little bit
of calculus shows that, as the number of dimensions grows, the ratio of the
surface area to the volume does too, so a shell of thin, constant thickness on
the interior of the sphere ends up containing almost all of its volume, and
overlaps heavily with any set of positive probability. Equivalently, all
well-behaved functional of a concentrated measure are very close to their
expectation values with very high probability. (This is equivalent because one
can specify a probability measure either through giving the probabilities of
sets in the sigma-field, or by giving the expectation values of a sufficiently
rich set of test functions, e.g., all bounded and continuous functions. [At
least, bounded and continuous functions are enough for the weak topology on
measures, but if you understand that qualification then you know what I'm
saying anyway.])

In applications to probability, one is typically concerned with "concentration bounds", which are statements of the order of "for samples of size \( n \), the probability that any function \( f \) in a well-behaved class \( \mathcal{F} \) differs from its expectation value by \( \epsilon \) or more is at most \( C e^{-nr(\epsilon)} \)", with explicit values of the prefactor \( C \) and the rate function \( r \); these might depend on the true distribution and on the function class \( \mathcal{F} \), but not on the specific function \( f \).

These finite-sample upper bounds are to be distinguished from asymptotic results giving matching upper and lower bounds (large deviations theory proper). They are also distinct from deviation inequalities which may depend on the specific function \( f \), though obviously finding such deviation inequalities is often a first step to proving concentration-of-measure results.

(Most of what I know about this subject I've learned from my former student Aryeh Kontorovich, but don't blame him for my mis-apprehensions.)

See also: Empirical Process Theory; Learning Theory; Model Selection; Statistical Mechanics

- Recommended, big picture:
- Stéphane Boucheron, Gábor Lugosi, and Pascal Massart, Concentration Inequalities: A Nonasymptotic Theory of Independence [Mini-review]
- Andreas Maurer, "Thermodynamics and Concentration",
Bernoulli
**18**(2012): 434--454, arxiv:1205.1595[Deriving concentration bounds from statistical-mechanical arguments; very nice.] - Maxim Raginsky, Igal Sason, Foundations and Trends in Communications and Information
Theory
**10**(2013): 1--246, "Concentration of Measure Inequalities in Information Theory, Communications and Coding", arxiv:1212.4663 [Mini-review]

- Recommended, close-ups:
- Sergey Bobkov, Mokshay Madiman, "Concentration of the information in data with log-concave distributions", Annals of Probability
**39**(2011): 1528--1543, arxiv:1012.5457 - Stéphane Boucheron, Gábor Lugosi, and Pascal Massart, "Concentration inequalities using the entropy method", Annals
of Probability
**31**(2003): 1583--1614 - Sourav Chatterjee and Partha S. Dey, "Applications of Stein's method for concentration inequalities", Annals of Probability
**38**(2010): 2443--2485, arxiv:0906.1034 - J.-R. Chazottes, P. Collet, C. Kuelske, and F. Redig,
"Concentration inequalities for random fields via coupling",
Probability Theory and Related Fields
**137**(2007): 201--225, math.PR/0503483 - Leonid (Aryeh) Kontorovich
[Comments]
- "Obtaining Measure Concentration from Markov Contraction", arxiv:0711.0987 [I really don't know why Leo bothered to take my class]
- "Measure Concentration of Hidden Markov Processes", math.PR/0608064
- "Measure concentration of Markov Tree Processes", math.PR/0608511
- "Concentration in unbounded metric spaces and algorithmic stability", pp. 2 --36 in ICML 2014, arxiv:1309.1007

- Aryeh Kontorovich and Anthony Brockwell, "A Strong Law of Large Numbers for Strongly Mixing Processes", arxiv:0807.4665
- Aryeh Kontorovich and Maxim Raginsky, "Concentration of measure without independence: a unified approach via the martingale method", arxiv:1602.00721
- Leonid Kontorovich and Kavita Ramanan, "Concentration Inequalities for Dependent Random Variables via the Martingale Method", Annals of Probability
**36**(2008): 2126--2158, arxiv:math/0609835 - Pascal Massart, Concentration Inequalities and Model Selection [Using empirical process theory to get finite-sample, i.e., non-asymptotic, risk bounds for various forms of model selection. Available for free as a large PDF preprint.]
- V. Maume-Deschamps, "Concentration Inequalities and Estimation of Conditional Probabilities" [PDF]
- Huiming Zhang, Song Xi Chen, "Concentration Inequalities for Statistical Inference", arxiv:2011.02258 [A review of useful results]

- To read:
- Sylvain Arlot, Gilles Blanchard, and Etienne Roquain, "Some nonasymptotic results on resampling in high dimension, I: Confidence regions", Annals of Statistics
**38**(2010): 51--82 - Alexander Barovnik, lecture notes on Measure Concentration
- Denis Belomestny, Vladimir Spokoiny, "Concentration inequalities for smooth random fields", arxiv:1307.1565
- S. G. Bobkov and F. Götze, "Concentration of empirical distribution functions with applications to non-i.i.d. models",
Bernoulli
**16**(2010): 1385--1414, arxiv:1011.6165 - Stéphane Boucheron and Maud Thomas
- "Concentration inequalities for order statistics", arxiv:1207.7209
- "Tail index estimation, concentration and adaptivity", arxiv:1503.05077

- Pietro Caputo, Georg Menz, Prasad Tetali, "Approximate tensorization of entropy at high temperature", arxiv:1405.0608
- Louis H. Y. Chen, Xiao Fang, "Multivariate Normal Approximation by Stein's Method: The Concentration Inequality Approach", arxiv:1111.4073
- Pierre Del Moral, Peng Hu and Liming Wu, "On the Concentration Properties of Interacting Particle Processes", Foundations and Trends in Machine Learning
**3**(2012): 225--389, arxiv:1107.1948 - Evarist Giné, Vladimir Koltchinskii, Wenbo Li, Joel Zinn, "High Dimensional Probability", arxiv:math/0612726
- Sébastien Gouëzel and Ian Melbourne, "Moment bounds and concentration inequalities for slowly mixing dynamical systems", arxiv:1404.0645
- Nathael Gozlan, Christian Léonard, "Transport Inequalities. A Survey", arxiv:1003.3852
- Arnaud Guillin, Aldéric Joulin, "Measure concentration through non-Lipschitz observables and functional inequalities", arxiv:1202.2341
- Russell Impagliazzo, Valentine Kabanets, "Constructive Proofs of Concentration Bounds", Electronic Colloquium on Computational Complexity TR10-072 [From my superficial and biased inspection, the most impenetrably non-probabilistic proof of a probabilistic result I have ever seen. But potentially interesting.]
- Can M. Le, Elizabeta Levina and Roman Vershynin, "Concentration and regularization of random graphs", arxiv:1506.00669
- Jing Lei, "Convergence and concentration of empirical measures under Wasserstein distance in unbounded functional spaces",
Bernoulli
**26**(2020): 767--798 - Ledoux, The Concentration of Measure Phenomenon
- Johannes C. Lederer, Sara A. van de Geer, "New Concentration Inequalities for Suprema of Empirical Processes", arxiv:1111.3486
- Jay Mardia, Jiantao Jiao, Ervin Tánczos, Robert D. Nowak, Tsachy Weissman, "Concentration Inequalities for the Empirical Distribution", arxiv:1809.06522
- Cesar Maldonado, "Fluctuation Bounds for Chaos Plus Noise in Dynamical Systems", Journal of Statistical Physics
**148**(2012): 548--564 - Nikolai Matni, Stephen Tu, "A Tutorial on Concentration Bounds for System Identification", arxiv:1906.11395
- Katalin Marton
- "Bounding $\bar{d}$-distance by informational
divergence: a method to prove measure
concentration", Annals
of Probability
**24**(1996): 857--866 [Thanks to Leo Kontorovich for the pointer] - "Logarithmic Sobolev inequalities in discrete product spaces: a proof by a transportation cost distance", arxiv:1507.02803

- "Bounding $\bar{d}$-distance by informational
divergence: a method to prove measure
concentration", Annals
of Probability
- Nikolai Matni, Stephen Tu, "A Tutorial on Concentration Bounds for System Identification", arxiv:1906.11395
- Elizabeth Meckes, "Quantitative asymptotics of graphical projection pursuit", arxiv:0811.2769
- Daniel Paulin, "Concentration inequalities for Markov chains by Marton couplings", arxiv:1212.2015
- Iosif Pinelis, "Optimal re-centering bounds, with applications to Rosenthal-type concentration of measure inequalities", arxiv:1111.2622
- Nathan Ross, "Fundamentals of Stein's method", arxiv:1109.1880
- Timothy John Sullivan, Houman Owhadi, "Equivalence of concentration inequalities for linear and non-linear functions", arxiv:1009.4913
- Talagrand, The Generic Chaining
- Joel A. Tropp, "An Introduction to Matrix Concentration Inequalities", arxiv:1501.01571
- Ramon van Handel, Probability in High Dimension [PDF lecture notes]
- Sara van de Geer
- "Statistical Theory for High-Dimensional Models", arxiv:1409.8557 [I presume that most of this is now also in her book on high-dimensional statistics]
- "On the uniform convergence of empirical norms and inner products, with application to causal inference", arxiv:1310.5523

- Roman Vershynin, High-Dimensional Probability: An Introduction with Applications in Data Science