December 31, 2011

Books to Read While the Algae Grow in Your Fur, December 2011

Attention conservation notice: I have no taste.

Andrea Camilleri, The Wings of the Sphinx; The Track of Sand; The Potter's Field
Delightful as always, though tinged with melancholy, because Montalbano is growing old (and making some questionable personal decisions because of it). The Track of Sand is perhaps the least Dick Francis-like mystery involving horse-racing I have run across.
Peter Bühlmann and Sara van de Geer, Statistics for High-Dimensional Data: Methods, Theory and Applications
(My mini-review has grown to a few thousand words, complete with figures, equations, and R, so I'll throttle down, and link to the review when I'm finished. In the meanwhile, a book report.)
This is a sound, thorough and reliable guide to what we currently know about linear (generalized linear, additive...) modeling in the high-dimensional regime where the number of adjustable parameters is much larger than the number of observations. The bulk of the book (chapters 2--9) is about the lasso (L1 penalization) and closely related methods. Chapters 2--5 and 9 are largely methodological; the theory comes in chapters 6--8, which are concerned with predictive accuracy, parametric consistency, and variable selection. These theoretical chapters make extensive use of empirical process techniques, which is not surprising considering that van de Geer wrote the book on empirical process theory in estimation. Chapter 14, really a kind of appendix, collects the necessary concepts and results from empirical process theory proper; it is formally self-contained, but probably some prior exposure would be helpful.
Chapters 10 and 11 turn consider issues of stability and statistical significance in variable selection, closely following recent work by Bühlmann and collaborators. Chapter 12 is a very nice treatment of boosting, where one uses an ensemble of highly-biased and low-capacity, but very stable, models to compensate for each other's faults. Chapter 13, finally, turns to graphical models, especially Gaussian graphical models, looking at ways of inferring the graph based on the lasso principle, on local regression, and, even more closely, the PC algorithm of P. Spirtes and C. Glymour. (This chapter draws on work by work by Kalisch and Bühlmann on how the PC algorithm works in the high-dimensional regime.) Causal inference is an important application of graphical models, but it is, perhaps wisely, not discussed.
The core chapters (6--8) are much rougher going than the more method-oriented ones, but that's just the nature of the material. (Incidentally, the stark contrast between the tools and concepts used in this book and what one finds in, say, Casella and Berger is a good illustration of how theoretical statistics has been shaped by intuitions about low-dimensional problems which serve us poorly in the high-dimensional regime.) I know of no better, more up-to-date summary of current theoretical knowledge about high-dimensional regression, and how it connects to practical methods. It could be used as a textbook, but for very advanced students; it's really better suited to self-study. For that, however, I can recommend it highly to anyone with a serious interest in the area.
Disclaimer: both authors are the kind of person who might get asked to review my application for tenure.
Tim Groseclose, Left Turn: How Liberal Media Bias Distorts the American Mind
I will, for my sins, have much more to say about this soon.
Here I will just remark on one point which I had to leave out of the longer piece, for reasons of space. The whole analysis based on models of decision-making by politicians and by media organizations, where they are supposed to get utility, in the strict sense, directly from citing advocacy organizations. Politicians, that is to say, do not shape their speeches with an eye to persuading other legislators, signaling their supporters among voters, signaling their supporters among funders, signaling potential voters or funders, threatening or bargaining with opponents --- nothing except the warm glow of ideological agreement matters to them. (There is such a thing as expressive action, and you can even model parts of it decision-theoretically, but this is not the way.) And yet this gets published in the Quarterly Journal of Economics, when run by those who think "people respond to incentives" is the law and the prophets. What this says about the intellectual and social organization of economics, and its colonies in other social sciences, I will leave to readers to decide.
(No purchase link because I think it's a truly bad book, though I dutifully bought my copy for the exercise.)
Update, August 2012: And the comment is out.
Norman Matloff, The Art of R Programming: A Tour of Statistical Software Design
This has been getting a lot of good press on various R blogs, and deservedly so. It is a clear, sound, user-friendly, no-nonsense introduction to programming through R, pitched at someone who has never programmed before (though not too hand-holding for someone who has). Statistical content is largely confined to the most basic sorts of statistical functions and the detailed examples, of which there are many. Unusual and welcome features: the detailed treatment of factors and tables; the chapters on input/output and on string manipulation; the chapter on debugging. (I am not sure how I feel about the chapter on parallelism: it's an important topic, but it feels too specialized for a first book.)
Naturally, I had complaints. Some of these are the inevitable ones about how I wish there'd been more: about simulation; about formulas and automatically manipulating model-fitting routines; about the split/apply/combine pattern; about working with databases and reshaping data. Others are matters of emphasis: I think Matloff is overly accepting of global variables and global assignment, which in my experience with students just makes things much harder to debug, especially once they start working together. My biggest beef is that Matloff is so focused on the nuts and bolts that he says very little about design principles — that is, about the art of programming. He certainly understands those principles, he even hints at them in the chapter on debugging, but a student would be really lucky to induce them from the book.
Still, while this is not a perfect fit for my highly specific needs, I wish it had been available in time to assign this fall. I will certainly assign it the next time I teach that class — unless a rival publisher offers a truly striking bribe something better comes out in the meanwhile.
(Another attraction of Matloff's book, as a textbook, is that it is so cheap. There is even a free PDF draft from September 2009; I haven't checked how much this differs from the published book.)
Madeleine E. Robins, The Sleeping Partner
Mind candy: very slightly alternate-history Regency England private-eye detection. It's a sequel to Point of Honour and Petty Treason. Please go out and buy all three, so that Robins will keep writing them.
Kage Baker, The Bird of the River
Baker's first two fantasy novels set in this world, The Anvil of the World and The House of the Stag, were funny, exciting, well-told. They also had an astonishing quality of contrivance, of every little detail locking together in a single intricate mechanism. Unless I have missed a lot (which is possible), this is merely a well-told fantasy novel which is also about various forms of growing up, and not Baker giving a bravura performance in the role of Providence. There may be a message in this. (Sadly, she died in 2010, far too soon, and there will not be any more of these.)
Matthew Restall and Amara Solari, 2012 and the End of the World: The Western Roots of the Maya Apocalypse
A brief yet thorough and comprehensive debunking of the idea that ancient Maya thought the world would end of 21 December 2012. Really, however, this is used as an excuse for introducing Maya civilization, the Western apocalyptic tradition, and how the latter was blended into the former after the Conquest. (They do not, sadly from my point of view, go very deeply into the history of modern 2012-ology.) Fast-paced, very clear, and far more polite to the peddlers of this brand of nonsense than they deserve.
Patrick O'Brian, Treason's Harbour, The Far Side of the World, The Reverse of the Medal
I read these too fast.

