Books to Read While the Algae Grow in Your Fur, December 2013
Attention conservation notice: I have no taste.
- Julia Spencer-Fleming, Through Evil Days
- Continuation of the long-running series. Pulls off the trick of making
domestic troubles, ice storms, and not-very-bright criminals equally
threatening. (Previously.)
- Prudence Shen and Faith Erin Hicks, Nothing Can Possibly Go Wrong
- Comic-book mind candy: a school story about evil cheerleaders vs. combat
robots. Delightful, though I don't usually care for such unalloyed social
realism.
- G. E. R. Lloyd, Disciplines in the Making: Cross-Cultural
Perspectives on Elites, Learning, and Innovation
- A learned comparative examination of how fields like mathematics, art, law,
religion, history, philosophy and medicine become marked off as cohesive and
(more or less) self-regulating disciplines, and some of the consequences of
doing so. There are no great revelations here (disciplines promote competence
in accepted skills and concepts, which can deepen understanding within a
limited area but may or may not lead to progress), but the pay-off here is
Lloyd's generous erudition, extending across multiple ancient classical
cultures and ethnographic records.
- Stéphane Boucheron, Gábor Lugosi and Pascal
Massart,
Concentration Inequalities: A Nonasymptotic Theory of Independence
- Maxim Raginsky and Igal Sason, "Concentration of Measure
Inequalities in Information Theory, Communications and
Coding", Foundations and Trends in Communications and Information
Theory 10 (2013):
1--246, arxiv:1212.4663
-
- Probability distributions in high-dimensional spaces are deeply weird
creatures. Suppose we take a point uniformly from the interior of a
high-dimensional hyper-sphere of radius 1. Our intuition, from experience with
the disc and the three-dimensional ball, is that most such points will be far
from the surface of the hyper-sphere. However, a little bit of calculus is
enough to show that as the dimension grows, the ratio of the surface area to
the volume also grows. This means that a thin, constant-thickness shell just
inside the surface ends up containing almost all of the volume --- almost all
random points are very close to the surface.
- So far, these are just curious facts about high-dimensional geometry, but
they have deeper consequences. This sort of "concentration of measure
phenomenon" turns out to be quite generic for high-dimensional distributions
where each coordinate follows its own distribution, independent of the other
coordinates. There is usually some set which captures almost all of the
probability, and other positive-probability sets have to be expanded just a
little to overlap with it. If we can figure out how things look on this
high-probability set, we know most of what the whole distribution will do.
- More specifically, the concentration of the probability measure on a
particular set turns out to be equivalent to the following: all sufficiently
nice functions are, with very high probability, close to their expectation
values or their medians. In fact, one can derive results of the form "for
dimensions, the probability that any function in a well-behaved class
differs from its expectation value by or more is ", with explicit values for the constant and
the rate-function . (Often, but not necessarily, .)
These might depend on the true distribution and on the function class
, but not on the specific function .
- To appreciate the significance of this, one needs to think back to more
classic parts of probability theory. The laws of large numbers say that
the averages of many independent random variables converge,
asymptotically, on expectation values. With some work, these results can be
extended to other sorts of functions of many variables, not just averages,
under conditions which amount to saying "the function can't depend too strongly
on any one variable". Such laws do not say how fast the convergence happens.
The large
deviations principles give asymptotic rates for the convergence of
averages, and some other functions of many variables, but say little or nothing
about what happens at finite . What sets concentration bounds apart is
that they put definite limits on the fluctuations away from typical behavior
at finite sizes --- they are non-asymptotic.
- The idea of concentration of measure goes back to work by Paul Lévy
and A. I. Khinchin before World War II. The field has really grown rapidly
over the last two decades, however, partly because of enabling technical
developments, and partly because of the growing demand from applications.
Machine learning has emerged as a field which really, really wants to be able
to give bounds on how well a predictor learned from a finite amount of data
will generalize to new data. In- and out-of- sample prediction errors are
complicated functions of the training data, and the size of the sample is
hardly ever so overwhelming that asymptotics are comforting. Thus
concentration bounds are exactly the sort of thing
a learning
theorist wants [1].
- These two works are two excellent introductions to concentration of
measure, and its uses for probability theory and its applications. The book of
Boucheron et al. is more pedagogical, particularly in its early
chapters, and at once more interested in purely-mathematical issues and in
machine learning. That of Raginsky and Sason is not so leisurely or so
catholic in its exposition (it's half the length of Boucheron et al.),
and, understandably, gives a lot more space to applications in information and
communication. Both of them, however, do an admirable job of conveying not
just the major results, but also the major classes of techniques used to prove
them, and so open the way for their readers to go about deriving new results.
Boucheron et al. has nothing on how to extend concentration from
independent to weakly-dependent random variables; and Raginsky and Sason treat
martingales much more extensively (in chapter 2), and have a little, but only a
little, on more-interesting-to-me stochastic processes.
(Since the latter is what really
got me interested in concentration, I probably regret this more than most
readers will.) They both also largely punt on the very important, but
forbiddingly technical, work of Talagrand and collaborators, though again
Raginsky and Sason say a bit more.
- I would have absolutely no hesitation in using either to teach a class to
advance graduate students in statistics, machine learning, or allied fields,
with the choice depending entirely on their background, interests, and funding
[2].
- Disclaimer: Maxim is a friend, and we have vague plans for a paper
together.
- [1]: That said, concentration in the technical
sense is more of a sufficient condition than a necessary one; cf.
this interesting recent paper by
Mendelson.
- [2]: In the "why oh why can't we have a better
academic publishing system" department, I note that Raginsky and Sason's book
is free online, while Boucheron et al.'s has a list price of $125.
The latter is a well-produced and sturdy hardback, but that is a very steep
price for the printing, binding and distribution services of the Oxford
University Press.
Books to Read While the Algae Grow in Your Fur;
Enigmas of Chance;
Pleasures of Detection, Portraits of Crime;
Writing for Antiquity;
The Collective Use and Evolution of Concepts
Posted at December 31, 2013 23:59 | permanent link