Books to Read While the Algae Grow in Your Fur, August 2011

Vladimir Vovk, Alex Gammerman and Glenn Shafer, Algorithmic Learning in a Random World
This is a badly-written book full of interesting results and ideas. The basic goal is simple: rather than making point forecasts, make predictions in the form of confidence sets, in such a way that the stated confidence level really does correspond to the actual probability of being right. An obvious approach would be to use Bayesian updating to form posterior-predictive sets, but those come with no guarantees of correct coverage, unless the prior is right, and indeed the Bayesian posterior probabilities can be arbitrarily bad (which is one reason why Bayesians need to test their models). Another tack would be to form a frequentist predictive distribution, but, while these exist, they're finicky and delicate.
The trick used in this book is wonderfully simple. Suppose data points are exchangeable (i.e., come from a "random world"), and we have a goodness-of-fit test which gives us a sensible (uniformly distributed) p-value. After observing a sequence of n data-points, consider all possible values for data-point n+1, and calculate their p-values. The ones which cannot be rejected at level a form the prediction set, with confidence level 1-a. All that is really needed for this to work is that we have some way of measuring the discrepancy or "conformity" of one data point with the others which gives uniformly-distributed ranks under the null hypothesis*. (This is why the authors call their scheme "conformal prediction"; it has nothing to do with conformal mappings in geometry, much less conformal field theory.) Actually calculating the prediction set in a reasonable way depends on the details of the conformity measure; they show that nearest-neighbor prediction, ridge regression, and some sorts of support vector machines are fairly easily handled.
The basic idea can be elaborated into predicting distributions ("Venn predictors"), into conditional confidence levels, into rescuing Bayesian prediction intervals, and in some situations into handling dependent data. For the last, they consider a set-up they call "on-line compression modeling", which amounts to postulating what Lauritzen calls a "totally sufficient" statistic, i.e., one which not only is sufficient in the ordinary sense, but which can be updated recursively, and screens off past and future observations. (Actually, I think that all they really need is a predictive Markovian representation, which can be constructed in great generality; in continuous time and for non-stationary processes, even.)
The book is, as I said, badly written. Formally, it only requires knowledge of stochastic processes to the point of understanding exchangeability (and de Finetti's theorem), martingales and Markov processes (and there are appendices to refresh the reader on measure-theoretic probability), and of statistics as far as regression, goodness-of-fit testing and confidence intervals. In practice, readers will find acquaintance with standard machine learning ideas, as found in e.g. Hastie, Tibshirani and Friedman, essential. Even with this background, the brilliant clarity of the main ideas is obscured by a large mass of unnecessary detail, non-standard notation and terminology (e.g., refusing to consider sequences of observations in favor of multisets, a.k.a. "bags", indicated by extra symbols; or eschewing the idea of sufficiency in the chapters on "on-line compression modeling"), and some rather dubious philosophy. (The distinction between "inductive" and "transductive" learning is neither defensible** nor even fruitful, and I say this with very deep respect for Vladimir Naumovich.) The obvious connections to frequentist prediction intervals, and to Butler's predictive likelihood, go unexplored. This is all unfortunate, but until someone writes a cleaner and clearer account of the theory, I have little choice but to recommend this to anyone with a serious interest in machine learning or statistical prediction.
*: I am indebted to Larry Wasserman for pointing out the importance of uniform ranking, and for discussing his work on extending these results, which he really ought to publish.
**: Supposedly, "transduction" is reasoning directly from the properties of individual observed cases to those of individual unobserved cases, without first inducing a general rule, and then deducing specific instances from it. Clearly, any inductive procedure can be turned into a transductive one simply by composition of functions. Conversely, any transductive procedure can be turned into an inductive one, by considering hypothetical new unobserved cases so as to map out the general rule. This is thus a distinction without a difference in terms of capacities. At most there might be a difference in terms of algorithmic representations (and computational complexity), but that's not relevant to the probabilistic or statistical theory undertaken here.
Update, 1 September: Shiva Kaul writes me to remonstrate with me about transduction. I quote his letter (with permission):
I think transduction (in the modern sense of the word, perhaps not what Vovk et al discuss) is statistically distinct from induction. I'm not aware of any transductive sample complexity upper bounds that beat corresponding lower bounds for inductive sample complexity. However, transductive upper bounds often beat inductive ones, e.g., "Collaborative Filtering with the Trace Norm: Learning, Bounding, and Transducing".
The reduction you posted doesn't work for matrix completion. By considering a hypothetical new missing entry, one eliminates a present entry, which could change the predicted values for the other missing entries.
My superficial impression from the paper Shiva points me to is that it deals with a finite set of objects (entries in a matrix), and the difference between the "inductive" and "transductive" set-ups comes from the former sampling entries with replacement, which is kind of silly in this context, while the latter does not. But clearly I need to read and think more deeply before being entitled to an opinion. (This concludes this edition of Shalizi Smackdown Watch.)
Tony Judt, Postwar: A History of Europe Since 1945
A massive, but utterly satisfying, total history of the European subcontinent since the close of the Second World War — which, of necessity, involves going back before the war for many things. Judt makes no secret of the fact that his sympathies lie with anti-Communist liberal social democracy. (He strives very hard to be fair, — his portrait of Thatcher, for instance, shows real respect, though no admiration — but I am clearly not best-placed to say if he succeeded.) Accordingly, to his mind the great and incredible accomplishment of western Europe is not just its recovery, but the construction, in the democratic welfare states, of one of the most free and most just forms of life humanity has yet known, intertwined with a new and uniquely peaceful form of international relations through the European Union. (He is very sound on the role the United States played in encouraging these developments, which we should be proud of.) That all these institutions were created with mixed motives, and are more or less flawed and corrupt, goes with their being human creations, and does not reduce their accomplishments. This story is contrasted, intelligently, with that of eastern Europe under Communist rule, ending with its startlingly peaceful dissolution, with due attention paid to Gorbachev's remarkable, if entirely un-intentional, achievements. (The one place where I find myself seriously questioning Judt's interpretations is his insistence that the Soviet economy could not be reformed without undermining Communist rule. Here he draws on local economists like János Kornai, and the argument even makes some sense, but how does it explain China and Vietnam?)
Judt does an outstanding and remarkable job of giving even coverage across space, across time, and across domestic and international politics, the economy, social life, popular and high culture, intellectual affairs, and connections and contrasts among all of these. (The only major area of endeavor he slights is the history of science and technology, for understandable reasons.) He moves seamlessly and illuminatingly from the economics of post-war reconstruction to criticism of films of the 1940s, and then to a [very characteristic] consideration of the content of collective memories of the war. Remarkably, he accomplishes all of this while not presuming that his readers know the story already. I recommend it most highly.
— Some of the passages here are recycled from essays collected in Reappraisals (or perhaps vice versa, considering how long he was working on this book).
Charlie Stross, The Fuller Memorandum
Mind-candy. Continuing Lovecraftian spy-fiction, with office comedy. These have never been quite as light-hearted as they first seem, but this one has some genuinely creepy and disturbing scenes and images. Enjoyable independently of previous books in the series, for certain values of "enjoyment".
(But it seems to me that Bob is unduly shaken in his atheism. [Since this all comes up in the first few pages, I don't count it as spoilers.] Yes, his universe has immensely powerful and ancient alien intelligences, some of whom take an interest in humanity. But that no more makes it a genuinely theistic universe than that of a Helicobacter living in a human gut. Ancient, powerful entities operating under weird-seeming rules of physics are not eternal, omnipotent supernatural beings. This is another expression of MacLeod's apophatic atheology.)
Margaret Maron, Storm Track; Slow Dollar; High Country Fall; Rituals of the Season; Winter's Child; Hard Row; Death's Half Acre
Why yes, I did basically spend a week in bed trying to distract myself from dental pain, how could you tell? These books go down like small, pleasant bits of candy, but like a lot of mystery stories they are also social fiction, the on-going theme here being the transformation of rural society in the South.
Benjamin I. Schwartz, The World of Thought in Ancient China
Fairly standard exposition of Chinese philosophy and some of its background through, roughly, the beginning of the Qin dynasty and the First Emperor, i.e., mostly the Hundred Schools of the Warring States period. I did not actually find it any more enlightening than, say, Fung Yu-lan's old book, let alone something like Graham's Disputers of the Tao. The main distinguishing features of Schwartz's book seem to be as follows. (1) Presuming the reader is already familiar with the broad outlines of the history, both political and intellectual. (2) Spending a lot of time disputing modern writers without bothering to fully expound their views (e.g., the argument with Fingarette in the chapter on Confucius, or with Needham in the chapter on cosmology*, both of which would have been impenetrable had I not read the other authors first), or even contrasting with more-or-less fashionable thinkers of the early 1980s (Geertz?!). The occasional stabs at, say, contrasting Confucius's ideas about ethics in public and private life with those of Plato and Aristotle are not sustained enough to really count as comparative history. Finally, (3) many very vague causal speculations, e.g., that the prevalence of ancestor worship made Chinese civilization more receptive to "universal monarchy" than other parts of the world. (I don't suppose that's impossible, but how on Earth could we tell?) In the end, I got a bit bored, and wouldn't really recommend this for non-specialists; try Disputers instead, or even Waley's vintage but engaging Three Ways of Thought in Ancient China. I am not, of course, qualified to say if it has any value for specialists in Chinese intellectual history.
Update: I am told, by someone who took Schwartz's classes at Harvard, that he was an inspiring teacher; I can well believe it. It's striking, and from my point of view a bit sad, how often great teaching fails to translate to the printed page, or for that matter vice versa.
*: To be clear, I think that Schwartz is right in his criticisms of Fingarette and Needham. The former's book on Confucius is a mere period piece from a now-abandoned phase of analytical philosophy; the latter engaged in a lot of speculation, wishful thinking, and sheer projection when writing about the "five elements" school. (This does not invalidate the scholarly value of Science and Civilisation in China.) But these hardly seems like one of the most important things to say about either school.
Megan Lindholm, Luck of the Wheels
Mind-candy fantasy novel; the fourth book in a series I haven't read, which I picked up because Lindholm's The Wizard of the Pigeons is a neglected classic of urban fantasy (from before that sub-genre got locked into its current formula), and I was curious about her other books. The first two-thirds or so of Luck of the Wheels is an amusing picaresque with some truly dreadful adolescents, followed by a blood-soaked revenge drama, finishing with what under the circumstances has to count as a happy ending, though from the viewpoint of the start of the novel it's an utter disaster. I am especially intrigued by the fact that every step in this transformation follows plausibly from the previous one. I will keep an eye peeled for the other books in this series.
— Incidentally, until looking up her website just now, I had no idea that Lindholm also writes lap-breaker fantasy epics as "Robin Hobb"; that answers my question about whatever happened to her...
Lois McMaster Bujold, Falling Free
Early and comparatively unpolished Bujold, which I had somehow never read before. It's not as masterful as her later works — in particular, the characters are not as richly developed. But even early, lesser Bujold is deeply entertaining. (The cover art of my old paperback copy is, as usual with this publisher, needlessly horrid; I am tempted to buy the NESFA Press edition simply to replace it with something bearable.)
Trey Shiels, The Dread Hammer
Mind-candy; fantasy full of the sort of no-good-can-come-of-this behavior you find in so many fairy tales, and for that matter epics. I will be reading the sequel. — "Shiels" is the open pen-name of Linda Nagata, who wrote some excellent hard science fiction novels in the 1990s and early 2000s, and then, well, went away for a while. This is not very much like her earlier books in theme or even style, but still good.
Patrick R. Laughlin, Group Problem Solving
A summary of research by experimental social psychologists on problem solving by groups of American college students, with special reference (not unreasonably!) to the contributions of one P. R. Laughlin and collaborators. These experiments are done on WEIRD subjects, and the problems are deliberately artificial, so there are the usual worries about generalizing to other contexts. (Is problem solving by, say, engineering designers really very much like cryptarithmetic?) But the experiments do nonetheless show some extremely interesting phenomena, and a general pattern of minimally-organized groups doing as well or better than the best individuals, under fairly careful controls. This book should really be taken more as an extended (158 pp.) review paper than a comprehensive treatise, and you have to brace yourself for a psychologist's idea of prose (and indeed a psychologist's idea of what constitutes a "theoretical model"; the online first chapter is representative in both respects), but it's a fast read, and full of useful information for anyone concerned with collective cognition. (The price for the hard-back edition is, however, outrageous.)
Duncan J. Watts, Everything Is Obvious, Once You Know the Answer
I'll actually try to give this a full write-up later, but in the meanwhile I will say: (1) this is great and recommended unreservedly; if you like this weblog at all you should definitely read it; (2) Tom Slee's review is very good; and (3) Duncan's been a friend since Santa Fe days, so feel free to discount my praise, but if I thought this was bad I'd just stay decently quiet about it.
Naomi Novik, Empire of Ivory
Mind-candy; enjoyable continuation of the series about dragons in the Napoleonic wars, in which Our Heroes venture to Africa, and the forces of European imperialism and the slave trade are righteously repelled. Of course, given the situation Novik has set up in her version of Africa, there is no way in Hell the trans-Atlantic slave trade could have begun in the first place; and no slave trade means astoundingly different European colonies in the Americas, if any at all, hence no French Revolution and no Napoleon. In short, the usual problem with alternate histories. (On examination, as so often, Timothy Burke said it first, and better.) But I will still get the sequel, because I want to know how she'll get her heroes out of the soup she lands them in at the end. — It's been long enough since I read the earlier installments that I found the catch-up parts welcome, and you could probably read this without the previous books, but I'd recommend starting the series at the beginning.
Karin Slaughter, Fallen
Absorbing, gruesome and wrenching as usual. I am not quite sure that it matches the past laid out in earlier books in the series, but this merely makes me want to go back and re-read them. (The coincidence of this book's title with one of Kathleen George's is I think due to the English language's sheer poverty of short, vaguely ominous phrases. But by this point, Slaughter could call a book Kittens and Flowers and it would fill me with apprehension.) — Previously.

Posted at August 31, 2011 23:59 | permanent link

