Books to Read While the Algae Grow in Your Fur, August 2012
Attention conservation notice: I have no taste.
- M. S. Bartlett, Stochastic Population Models in Ecology and Epidemiology
- Short (< 100 pp.) introduction to the topic as of 1960, and so inevitably
mostly of historical interest. It was aimed at people who already knew some
statistics and probability, but no biology to speak of. The models: correlated
spread of plants (or colonies) around Poisson-distributed centers; birth-death
models of population dynamics; Lotka-Volterra style competition between
species; susceptible-infected-removed models of epidemics, with and without
spatial structure. (These are, of course, still staples of the field...) All
models are presented generatively, developed from plausible, though explicitly
highly simplified, premises about how organisms behave. Bartlett makes a lot
of use of generating-functionology, and sometimes heroic approximations to get
closed-form expressions --- but he also introduces his readers to Monte Carlo,
and one gets the impression that he uses this as much as he could afford to.
Nicest of all, he takes pains to connect everything to real data.
- Despite its technical obsolescence, I plan to rip it off shamelessly for
examples when next I teach stochastic processes,
or complexity and
inference.
- Patricia A. McKillip, The Bards of Bone Plain
- Two interlocking stories about bards, far-separated in time, searching for
secrets in poetry. Beautiful prose, as always, but for the first time that I
can remember with one of McKillip's books, the ending felt rushed. Still
eminently worth reading.
- Lucy
A. Snyder, Switchblade Goddess
- Mind candy: a sorceress from Ohio continues (previous installments)
her efforts to get out of Texas, but keeps being dragged back to various hells.
I am deeply ambivalent about recommending it, however. The best I can do to
say why, without spoilers, is that key parts of the book are at once
effectively (even viscerally) narrated, and stuff I wish I'd never
encountered. Mileage, as they say, varies.
- Spoilers:
Gur pbasyvpg orgjrra Wrffvr naq gur rcbalzbhf "fjvgpuoynqr tbqqrff" guvf gvzr vaibyirf abg whfg zvaq-tnzrf, nf va gur cerivbhf obbx, ohg ivivqyl qrfpevorq naq uvtuyl frkhnyvmrq obql ubeebe, jvgu Wrffr'f bja ernpgvbaf gb gur ivbyngvbaf bs ure obql naq zvaq orvat irel zhpu n cneg bs obgu gur frkhnyvgl naq gur ubeebe. Senaxyl, gubfr puncgref fdhvpxrq zr gur shpx bhg. V'z cerggl fher gung'f jung gubfr cnegf bs gur obbx jrer vagraqrq gb qb, fb cbvagf sbe rssrpgvir jevgvat, ohg V qvqa'g rawbl vg ng nyy. Cneg bs gung znl or gur pbagenfg gb gur guevyyvat-nqiragher gbar bs gur cerivbhf obbxf, naq rira zbfg bs guvf bar. (Znlor vs V'q tbar va rkcrpgvat ubeebe?)
- Thomas W. Young, The
Renegades
- Mind candy; thriller about the US war in Afghanistan, drawing on the
author's experience as a military pilot. I liked it — Young has a knack
for effective descriptions in unflashy prose — but I am not sure if that
wasn't because it played to some of my less-defensible prejudices. (Sequel
to Silent Enemy and The Mullah's Storm, but
self-contained.)
- Lt. Col. Sir Wolseley Haig (ed.), Cambridge History of India, vol. III: Turks and Afghans
- Picked up because I ran across it at
the local used
bookstore, and it occurred to me I knew next to nothing about what happened
in India between the invasions
by Mahmud of Ghazni
and the conquest by Babur.
What I was neglecting was that this was published in 1928...
- Given over 500 pages to describe half a millennium of the life of one the
major branches of civilization, do you spend them on the daily life and customs
of the people, craft and technology, commerce, science, literature, religion,
administration, and the arts? Or do you rather devote it almost exclusively to
wars (alternately petty and devastating), palace intrigues, rebuking the long
dead for binge drinking, and biologically absurd speculations on the
"degeneration" of descendants of central Asians brought on by the climates of
India, with a handful of pages that mention Persianate poetry, religion (solely
as an excuse for political divisions) and tax farming? Evidently if you were a
British historian towards the end of the Raj, aspiring to write the definitive
history of India, c. +1000 to c. +1500, the choice was clear.
- — To be fair, the long final chapter, on monumental Muslim
architecture during the period, is informed and informative, though still full
of pronouncements about the tastes and capacities of "the Hindu"*, "the
Muslim"**, "the Persian"***, "the Arab" (and "Semitic peoples")****, , etc.
And no doubt there are readers for whom this sort of obsessive recital of
politico-military futility is actually useful, and would appreciate it being
told briskly, which it is.
- Recommended primarily if you want a depiction of 500 years of aggression
and treachery that makes Game of Thrones seem like Jenny and
the Cat Club.
- *: "In the Indian architect this sense for the
decorative was innate; it came to him as a legacy from the pre-Aryan
races..."
- **: "Elaborate decoration and brightly coloured
ornament were at all times dear to the heart of the Muslim."
- ***: "[Persia's] genius was of the mimetic rather
than the creative order, but she possessed a magic gift for absorbing the
artistic creations of other countries and refining them to her own standard of
perfection."
- ****: "With the Arabs, who in the beginning of the
eighth century possessed themselves of Sind, our concern is small. Like other
Semitic peoples they showed but little natural instinct for architecture or the
formative arts." Not, "Our concern is small, because few of their works have
survived, and they seem to have had little influence on what came later", which
would have been perfectly reasonable.
- Alexandre B. Tsybakov, Introduction to Nonparametric Estimation
- What it says on the label. This short (~200 pp.) book is an introduction
to the theory of non-parametric statistical estimation, divided, like Gaul,
into three parts.
- The first chapter introduces the basic problems considered: estimating a
probability density function, estimating a regression function (with fixed and
random placement of the input variable), and estimating a function observed
through Gaussian noise. (The last of these has applications in signal
processing, not discussed, and equivalences to the other problems, treated in
detail.) The chapter then introduces the main methods to be used: kernel
estimators, local polynomial, "projection" estimators (i.e., approximating the
unknown function by a series expansion in orthogonal functions, especially but
not exclusively Fourier expansions). The goal in this case is to establish
upper bounds on the error of the function estimates, for different notions of
error (mean-square at one point, mean-square averaged over space, maximum
error, etc.). The emphasis is on finding the asymptotic rate at which these
upper bounds go to zero. To achieve this, the text assumes that the unknown
function lies in a space of functions which are more or less smooth, and
upper-bounds how badly wrong kernels (or whatever) can go on such functions.
(If you find yourself skeptically muttering "And how do I know the regression
curve lies in a Sobolev \( \mathcal{S}(\beta,L) \) space1?", I would first of all ask you why assuming linearity
isn't even worse, and secondly ask you to wait until the third chapter.) A
typical rate here would be that the mean-squared error of kernel regression is
\( O(n^{-2\beta/(2\beta+1)}) \), where \( \beta > 0 \) is a measure of the
smoothness of the function class. While such upper bounds have real value, in
reassuring us that we can't be doing too badly, they may leave us worrying that
some other estimator, beyond the ones we've considered, would do much better.
- The goal of the second chapter is to alleviate this worry, by establishing
lower bounds, and showing that they match the upper bounds found in chapter 1.
This is a slightly tricky business. Consider
the
calibrating
macroeconomist Fool2
who says in his heart "The regression line is \( y = x/1600 \)", and sticks to
this no matter what the data might be. In general, the Fool has horrible, \(
O(1) \) error --- except when he's right, in which case his error is exactly
zero. To avoid such awkwardness, we compare our non-parametric estimators to
the minimax error rate, the error which would be obtained by a
slightly-imaginary3 estimator
designed to make its error on the worst possible function as small as possible.
(What counts as "the worst possible function" depends on the estimator, of
course.) The Fool is not the minimax estimator, since his worst-case error is
\( O(1) \), and the upper bounds tell us we could at least get \(
O(n^{-2\beta/(2\beta+1)}) \).
- To get actual lower bounds, we use the correspondence between estimation
and testing. Suppose we can always find two far-apart regression curves no
hypothesis test could tell apart reliably. Then the expected estimation error
has to be at least the testing error-rate times the distance between those
hypotheses. (I'm speaking a little loosely; see the book for details.) To
turn it around, if we can estimate functions very precisely, we can use our
estimates to reliably test which of various near-by functions are right. Thus,
invoking Neyman-Pearson theory, and various measures of
distance or divergence between probability distributions, gives us fundamental
lower bounds on function estimation. This reasoning can be extended to testing
among more than two hypotheses, and
to Fano's
inequality. There is also an intriguing section, with new-to-me material,
on Van Trees's
inequality, which bounds Bayes risk4 in terms of integrated Fisher information.
- It will not, I trust, surprise anyone that the lower bounds from Chapter 2
match the upper bounds from Chapter 1.
- The rates obtained in Chapters 1 and 2 depend on the smoothness of the true
function being estimated, which is unknown. It would be very annoying to have
to guess this — and more than annoying to have to guess it
right. An "adaptive" estimator, roughly speaking, is one which doesn't have to
be told how smooth the function is, but can do (about) as well as one which was
told that by an Oracle. The point of chapter 3 is to set up the machinery
needed to examine adaptive estimation, and to exhibit some adaptive estimators
for particular problems, mostly of the projection-estimator/series-expansion
type. Unlike the first two chapters, the text of chapter 3 does not motivate
itself very well, but the plot will be clear to experienced readers.
- The implied reader has a firm grasp of parametric statistical inference (to
the level of, say, Pitman or
Casella
and Berger) and of Fourier analysis, but in principle no more. There is a
lot more about statistical theory than I have included in my quick sketch of
the books' contents, such as the material on unbiased risk estimation,
efficiency and super-efficiency, etc.; the patient reader could figure
this all out from what's in Tsybakov, but either a lot of prior exposure, or a
teacher, would help considerably. There is also nothing about data, or
practical/computational issues (not even a mention of the curse of
dimensionality!). The extensive problem sets at the end of each chapter will
help with self-study, but I feel like this is really going to work best as a
textbook, which is what it was written for. It would be the basis for an
strong one-semester course in advanced statistical theory, or, supplemented
with practical exercises (and perhaps with
All
of Nonparametric Statistics) a first graduate5 class in non-parametric estimation.
- 1: As
you know, Bob, that's the class of all functions which can be
differentiated at least \( \beta \) times, and where the integral of the
squared \( \beta^{\mathrm{th}} \) derivative is no more than \( L \). (Oddly,
in some places Tsybakov's text has \( \beta-1 \) in place of \( \beta \), but I
think the math always uses the conventional
definition.) ^
- 2: To be clear, I'm the one
introducing the character of the Fool here; Tsybakov is more
dignified. ^
- 3: I say "slightly imaginary"
because we're really taking an infimum over all estimators, and there may not
be any estimator which actually attains the infimum. But "infsup" doesn't
sound as good as "minimax". ^
- 4: Since Bayes risk is
integrated over a prior distribution on the unknown function, and minimax risk
is the risk at the single worst unknown function, Bayes risk provides
a lower bound on minimax risk. ^
- 5: For a first undergraduate
course in non-parametric estimation, you could use
Simonoff's Smoothing
Methods in Statistics, or even, if
desperate, Advanced
Data Analysis from an Elementary Point of
View. ^
- Peter J. Diggle and Amanda
G. Chetwynd, Statistics and Scientific Method: An Introduction for
Students and Researchers
- I have mixed feelings about this.
- Let me begin with the good things. The book's heart is very much in the
right place: instead of presenting statistics as a meaningless collection of
rituals, show it as a coherent body of principles, which scientific
investigators can use as tools for inquiry. The intended audience is (p. ix)
"first-year postgraduate students in science and technology" (i.e., what we'd
call first-year graduate students), with "no prior knowledge of statistics",
and no "mathematical demands... beyond a willingness to get to grips with
mathematical notation... and an understanding of basic algebra". After some
introductory material, a toy example of least-squares fitting, and a chapter on
general ideas of probability and maximum likelihood estimation, Chapters 4--10
all cover useful statistical topics, all motivated by real data, which is used
in the discussion*. The book treats regression modeling, experimental design,
and dependent data all on an equal footing. Confidence intervals are
emphasized over hypothesis tests, except when there is some substantive reason
to want to test specific hypotheses. There is no messing about with commercial
statistical software (there is a very brief but good appendix on R), and code
and data are given to reproduce everything. Simulation is used to good effect,
where older texts would've wasted time on exact calculations. I
would much rather see scientists read this than the usual sort of
"research methods" boilerplate.
- On the negative side: The bit about "scientific method" in the title,
chapter 1, chapter 7, and sporadically throughout, is not very good. There is
no real attempt to grapple with the literature on methodology — the only
philosopher cited is Popper, who gets invoked once, on p. 80. I will permit
myself to quote the section where this happens in full.
7.2 Scientific Laws
Scientific laws are expressions of quantitative relationships between variables in nature that have been validated by a combination of observational and experimental evidence.
As with laws in everyday life, accepted scientific laws can be challenged over time as new evidence is acquired. The philosopher Karl Popper summarizes this by emphasizing that science progresses not by proving things, but by disproving them (Popper, 1959, p. 31). To put this another way, a scientific hypothesis must, at least in principle, be falsifiable by experiment (iron is more dense than water), whereas a personal belief need not be (Charlie Parker was a better saxophonist than John Coltrane).
7.3 Turning a Scientific Theory into a Statistical Model...
That sound you hear is pretty much every philosopher of science since Popper
and Hempel, crying out from
Limbo, "Have
we lived and fought in vain?"
- Worse: This has also got very little with what chapter 7 does,
which is fit some regression models relating how much plants grow to how much
of the pollutant glyphosphate they were exposed to. The book settles on a
simple linear model after some totally ad hoc transformations of the
variables to make that look more plausible. I am sure that the authors —
who are both statisticians of great experience and professional eminence
— would not claim that this model is an actual scientific law, but
they've written themselves into a corner, where they either have to pretend
that it is, or be unable to explain the scientific value of their
model. (On the other hand, accounts of scientific method centered on models,
e.g., Ronald Giere's, have
no particular difficulty here.)
- Relatedly, the book curiously neglects issues of power in model-checking.
Still with the example of modeling the response of plants to different
concentrations of pollutants, section 7.6.8 considers whether to separately
model the response depending on whether the plants were watered with distilled
or tap water. This amounts to adding an extra parameter, which increases the
likelihood, but by a statistically-insignificant amount (p. 97). This ignores,
however, the question of whether there is enough data, precisely-enough
measured, to notice a difference — i.e., the power to detect
effects. Of course, a sufficiently small effect would always be insignificant,
but this is why we have confidence intervals, so that we can distinguish
between parameters which are precisely known to be near zero, and those about
which we know squat. (Actually, using a confidence interval for the difference
in slopes would fit better with the general ideas laid out here in chapter 3.)
If we're going to talk about scientific method, then we need to talk about
ruling out alternatives (as in,
e.g., Kitcher),
and so about power and severity (as in Mayo).
- This brings me
to lies-told-to-children.
Critical values of likelihood ratio tests, under the standard asymptotic
assumptions, are given in Table 3.2, for selected confidence levels and numbers
of parameters. The reader is not told where these numbers come from (\( \chi^2
\) distributions), so they are given no route to figure out what to do in cases
which go beyond the table. What is worse, from my point of view, is that they
are given no rationale at all for where the table comes from (\(
\chi^2 \) here falls out from Gaussian fluctuations of estimates around the
truth, plus a second-order Taylor expansion), or why the likelihood ratio test
works as it does, or even a hint that there are situations where the usual
asymptotics will not apply. Throughout, confidence intervals and the
like are stated based on Gaussian (or, as the book puts it, capital-N "Normal")
approximations to sampling distributions, without any indication to the reader
as to why this is sound, or when it might fail. (The word "bootstrap" does not
appear in the index, and I don't think they use
the concept at all.) Despite their
good intentions, they are falling back on rituals.
- Diggle and Chetwynd are both very experienced both applied statisticians
and as teachers of statistics. They know better in their professional
practice. I am sure that they teach their statistics students better. That
they don't teach the readers of this book better is a real lost
opportunity.
- Disclaimer: I may turn my own
data analysis notes into
a book, which would to some degree compete with this one.
- *: For the record: exploratory data analysis and
visualization, motivated by gene expression microarrays; experimental design,
motivated by agricultural and clinical field trials; comparison of means,
motivated by comparing drugs; regression modeling, motivated by experiments on
the effects of pollution on plant growth; survival analysis, motivated by
kidney dialysis; time series, motivated by weather forecasting; and spatial
statistics, motivated by air pollution monitoring.
- Geoffrey Grimmett and David
Stirzaker, Probability and Random Processes, 3rd
edition
- This is still my favorite stochastic processes textbook. My copy of the
second edition, which has been with me since graduate school, is falling apart,
and so I picked up a new copy at JSM, and of course began re-reading on the
plane...
- It's still great: it strikes a very nice balance between accessibility and
mathematical seriousness. (There is just enough shown of
measure-theoretic probability that students can see why it will be useful,
without overwhelming situations where more elementary methods suffice.) It's
extremely sound at focusing on topics which are interesting because they can be
connected back to the real world, rather than being self-referential
mathematical games. The problems and exercises are abundant and
well-constructed, on a wide range of difficulty levels. (They are now
available separately, with solutions manual,
as One
Thousand Exercises in Probability.)
- I am very happy to see more in this edition on Monte Carlo and on
stochastic calculus. (My disappointment that the latter builds towards the
Black-Scholes model is irrational, since they're
giving the audience what it wants.) Nothing seems to have been dropped from
earlier editions.
- It does have limitations. It's a book about the mathematics of
probabilistic models, but has little to say about how one designs such a model
in the first place. This may be inevitable, since the tools of model-building
must change with the subject matter1. There is also no systematic account here of
statistical inference for stochastic processes, but this is so universal among
textbooks on stochastic processes that it's easier to name
exceptions2 than instances. If a
fourth edition would fix this, I would regard the book as perfect; instead, it
is merely almost perfect.
- The implied reader has a firm grasp of calculus (through multidimensional
integration) and a little knowledge of linear algebra. They can also read and
do proofs. No prior knowledge of probability is, strictly speaking, necessary,
though it surely won't hurt. With that background, and the patience to tackle
600 pages of math, I unhesitatingly recommended this as a first book on random
processes for advanced undergraduates or beginning graduate students, or for
self-study.
- 1: E.g., tracking stocks and
flows of conserved quantities, and making sure they balance, is very useful in
physics and chemistry, and even some parts of biology. But it's not very
useful in the social sciences, since hardly any social or economic variables of
any interest are conserved. (I had never truly appreciated Galbraith's quip
that
"The
process by which banks create money is so simple that the mind is repelled"
until I tried to explain to an econophysicist that money
is not, in fact, a conserved quantity.) And so
on. ^
- 2: The best exception I've seen
is Peter
Guttorp's Stochastic
Modeling of Scientific Data. It's a very good introduction to
stochastic processes and their inference for an audience who already knows some
probability, and statistics for independent data; it also talks about
model-building. But it doesn't have the same large view of stochastic
processes as Grimmett and Stirzaker's book, or the same clarity of exposition.
Behind that, there
is Bartlett's Stochastic
Processes, though it's now antiquated. From a different tack,
Davison's Statistical
Models includes a lot on models of dependent data, but doesn't
systematically go into the theory of such
processes. ^
Books to Read While the Algae Grow in Your Fur;
Enigmas of Chance;
Scientifiction and Fantastica;
Afghanistan and Central Asia;
Biology;
Writing for Antiquity
Posted at August 31, 2012 23:59 | permanent link