Books to Read While the Algae Grow in Your Fur, December 2011
Attention conservation notice: I have no taste.
- Andrea Camilleri, The Wings of the Sphinx; The Track of Sand;
The Potter's Field
- Delightful as always, though tinged with melancholy, because Montalbano is
growing old (and making some questionable personal decisions because of it).
The Track of Sand is perhaps the least Dick Francis-like mystery
involving horse-racing I have run across.
- Peter Bühlmann and Sara van de Geer, Statistics for High-Dimensional Data: Methods, Theory and Applications
- (My mini-review has grown to a few thousand words, complete with figures,
equations, and R, so I'll throttle down, and link to the review when I'm
finished. In the meanwhile, a book report.)
- This is a sound, thorough and reliable guide to what we currently know
about linear (generalized
linear, additive...) modeling
in the high-dimensional regime where the number of adjustable parameters is
much larger than the number of observations. The bulk of the book (chapters
2--9) is about the lasso (L1 penalization) and closely
related methods. Chapters 2--5 and 9 are largely methodological; the theory
comes in chapters 6--8, which are concerned with predictive accuracy,
parametric consistency, and variable selection. These theoretical chapters
make extensive use
of empirical process
techniques, which is not surprising considering that van de
Geer wrote
the book on empirical process theory in estimation. Chapter 14, really a
kind of appendix, collects the necessary concepts and results from empirical
process theory proper; it is formally self-contained, but probably some prior
exposure would be helpful.
- Chapters 10 and 11 turn consider issues of stability and statistical
significance in variable selection, closely following recent work by
Bühlmann and collaborators. Chapter 12 is a very nice treatment of
boosting, where one
uses an ensemble of highly-biased and low-capacity, but very stable, models to
compensate for each other's faults. Chapter 13, finally, turns
to graphical models,
especially Gaussian graphical models, looking at ways of inferring the graph
based on the lasso principle, on local regression, and, even more closely,
the PC algorithm of
P. Spirtes
and
C. Glymour.
(This chapter draws on work by
work by Kalisch
and Bühlmann on how the PC algorithm works in the high-dimensional
regime.) Causal inference is an important application of graphical models, but
it is, perhaps wisely, not discussed.
- The core chapters (6--8) are much rougher going than the more
method-oriented ones, but that's just the nature of the material.
(Incidentally, the stark contrast between the tools and concepts used in this
book and what one finds in, say, Casella and Berger is a good illustration of
how theoretical
statistics has been shaped by intuitions about low-dimensional problems which
serve us poorly in the high-dimensional regime.) I know of no better, more
up-to-date summary of current theoretical knowledge about high-dimensional
regression, and how it connects to practical methods. It could be
used as a textbook, but for very advanced students; it's really better suited
to self-study. For that, however, I can recommend it highly to anyone with a
serious interest in the area.
- Disclaimer: both authors are the kind of person who might get
asked to review my application for tenure.
- Tim
Groseclose, Left Turn: How Liberal Media Bias Distorts the American
Mind
- I will, for my sins, have much more to say about this soon.
- Here I will just remark on one point which I had to leave out of the longer
piece, for reasons of space. The whole analysis based on models of
decision-making by politicians and by media organizations, where they are
supposed to get utility, in the strict sense, directly from citing
advocacy organizations. Politicians, that is to say, do not shape their
speeches with an eye to persuading other legislators, signaling their
supporters among voters, signaling their supporters among funders, signaling
potential voters or funders, threatening or bargaining with opponents ---
nothing except the warm glow of ideological agreement matters to them. (There
is such a thing as expressive action, and you can
even model
parts of it decision-theoretically, but this is not the way.) And yet this
gets published in the Quarterly Journal of Economics, when run by
those who think "people respond to incentives" is the law and the prophets.
What this says about the intellectual and social organization of economics, and
its colonies in other social sciences, I will leave to readers to decide.
- (No purchase link because I think it's a truly bad book, though I dutifully
bought my copy for the exercise.)
- Update, August 2012: And the comment is
out.
- Norman
Matloff, The Art of R Programming: A Tour
of Statistical Software Design
- This has been getting a lot of good press on various R blogs, and
deservedly so. It is a clear, sound, user-friendly, no-nonsense introduction
to programming through R, pitched at someone who has never programmed before
(though not too hand-holding for someone who has). Statistical content is
largely confined to the most basic sorts of statistical functions and the
detailed examples, of which there are many. Unusual and welcome features: the
detailed treatment of factors and tables; the chapters on input/output and on
string manipulation; the chapter on debugging. (I am not sure how I feel about
the chapter on parallelism: it's an important topic, but it feels too
specialized for a first book.)
- Naturally, I had complaints. Some of these are the inevitable ones about
how I wish there'd been more: about simulation; about formulas and
automatically manipulating model-fitting routines; about the
split/apply/combine pattern; about working with databases and reshaping data.
Others are matters of emphasis: I think Matloff is overly accepting of global
variables and global assignment, which in my experience with students just
makes things much harder to debug, especially once they start working together.
My biggest beef is that Matloff is
so focused on the nuts and bolts that he says very little about design
principles — that is, about the art of programming. He
certainly understands those principles, he even hints at them in the
chapter on debugging, but a student would be really lucky to induce them from
the book.
- Still, while this is not a perfect fit
for my highly specific needs, I wish it had
been available in time to assign this fall. I will certainly assign it the
next time I teach that class — unless
a rival publisher offers a
truly striking bribe something better comes out in the meanwhile.
- (Another attraction of Matloff's book, as a textbook, is that it is so
cheap. There is even
a free PDF
draft from September 2009; I haven't checked how much this differs from the
published book.)
- Madeleine
E. Robins, The
Sleeping Partner
- Mind candy: very slightly alternate-history Regency England private-eye
detection. It's a sequel
to Point of
Honour and Petty
Treason. Please go out and buy all three, so that Robins will keep
writing them.
- Kage Baker, The Bird of the River
- Baker's first two fantasy novels set in this
world, The Anvil of the
World and The House of the
Stag, were funny, exciting, well-told. They also had an astonishing
quality of contrivance, of every little detail locking together in a
single intricate mechanism. Unless I have missed a lot (which is possible),
this is merely a well-told fantasy novel which is also about various
forms of growing up, and not Baker giving a bravura performance in the role of
Providence. There may be a message in this. (Sadly, she died in 2010, far too
soon, and there will not be any more of these.)
- Matthew Restall and Amara Solari, 2012 and the End of the World: The Western Roots of the Maya Apocalypse
- A brief yet thorough and comprehensive debunking of the idea that ancient
Maya thought the world would end of 21 December 2012. Really, however, this is
used as an excuse for introducing Maya civilization, the Western apocalyptic
tradition, and how the latter was blended into the former after the Conquest.
(They do not, sadly from my point of view, go very deeply into the history of
modern 2012-ology.) Fast-paced, very clear, and far more polite to the
peddlers of this brand of nonsense than they deserve.
- Patrick O'Brian, Treason's Harbour, The Far Side of the World, The Reverse of the Medal
- I read these too fast.
Books to Read While the Algae Grow in Your Fur;
Pleasures of Detection, Portraits of Crime;
The Commonwealth of Letters;
Enigmas of Chance;
Scientifiction and Fantastica
Psychoceramica;
Writing for Antiquity;
Commit a Social Science;
The Running-Dogs of Reaction
Posted at December 31, 2011 23:59 | permanent link