Concentration of Measure

08 Dec 2023 12:17

Roughly speaking, one says that a probability measure is "concentrated" if any set of positive probability can be expanded very slightly to contain most of the probability. A classic example (which I learned in statistical mechanics) is the uniform distribution on the interior of unit spheres in high dimensions. A little bit of calculus shows that, as the number of dimensions grows, the ratio of the surface area to the volume does too, so a shell of thin, constant thickness on the interior of the sphere ends up containing almost all of its volume, and overlaps heavily with any set of positive probability. Equivalently, all well-behaved functional of a concentrated measure are very close to their expectation values with very high probability. (This is equivalent because one can specify a probability measure either through giving the probabilities of sets in the sigma-field, or by giving the expectation values of a sufficiently rich set of test functions, e.g., all bounded and continuous functions. [At least, bounded and continuous functions are enough for the weak topology on measures, but if you understand that qualification then you know what I'm saying anyway.])

In applications to probability, one is typically concerned with "concentration bounds", which are statements of the order of "for samples of size \( n \), the probability that any function \( f \) in a well-behaved class \( \mathcal{F} \) differs from its expectation value by \( \epsilon \) or more is at most \( C e^{-nr(\epsilon)} \)", with explicit values of the prefactor \( C \) and the rate function \( r \); these might depend on the true distribution and on the function class \( \mathcal{F} \), but not on the specific function \( f \).

These finite-sample upper bounds are to be distinguished from asymptotic results giving matching upper and lower bounds (large deviations theory proper). They are also distinct from deviation inequalities which may depend on the specific function \( f \), though obviously finding such deviation inequalities is often a first step to proving concentration-of-measure results.

(Most of what I know about this subject I've learned from my former student Aryeh Kontorovich, but don't blame him for my mis-apprehensions.)

See also: Empirical Process Theory; Learning Theory; Model Selection; Statistical Mechanics