The Bactra Review: Occasional and eclectic book reviews by Cosma Shalizi   144

Identification for Prediction and Decision

by Charles F. Manski

Cambridge, Massachusetts: Harvard University Press, 2009

Better Roughly Right than Exactly Wrong

A space of stochastic models is identifiable when any two points in the space yield distinct probability distributions over the observables. The name arises because then one could, with enough data, decide which of the two distributions was right; one could identify the distribution. More broadly, parameters (functionals) of stochastic models are identifiable when they are functions of the distribution of observables. This is a probabilistic rather than a statistical concept, but one which is obviously important to statistical work — consistent estimation of a non-identifiable parameter is clearly not possible.

Manski introduced an important notion of partial identification, which happens when perfect data is not enough to pin a parameter down to a single point value, but can impose non-trivial bounds on it. The degree to which a parameter is identifiable, in this sense, depends on the class of models one is willing to consider, i.e., on the strength of the assumptions one is willing to make. The stronger those assumptions, the narrower the bounds, in some cases yielding full or point-identification. This book is an introduction to work by Manski and co-authors on partial identifiability in social-scientific and policy-making problems. The main themes are that (1) many parameters are only very poorly identified with credible assumptions, due to the ubiquity of missing data; but (2) non-trivial partial-identification bounds, based on not-too-strong assumptions, do exist; while (3) the traditional assumptions used to point-identify parameters (e.g., linear homogeneous demand-and-supply curves in economics, plus instrumental variables) are very strong, sometimes completely unfalsifiable, and have little or no basis in established or even conjectural theory. Finally, (4) it is important for both social science and policy to admit to the uncertainty or ambiguity this leaves us with, rather than simply making stuff up so as to be definite. [1]

Manski is quite serious about the "decision" part of his title. The reason we want to know the kinds of things he's been concerned with is that they tell us what will happen (or tend to happen) when we take different sorts of action in the world. If there is uncertainty about consequences, then there is uncertainty about policy, too. (In this vein, a later paper of Manski's on "actualist rationality" is very much worth reading.) Like many writers, he admires Wald's notion of "statistical decision functions", rules which tell the policy maker what action to take in new cases, as a function of training data; Manski is particularly taken with the idea that a good rule will minimize the maximum regret one will experience as a consequence of using it rather than some other decision function. There are some interesting results here on the implications of partial identifiability for decision functions. There is also a lament that this style of decision theory went out of fashion in the 1960s, in favor of more cut-and-dried Bayesian decision analysis. To my mind, though, this sort of work is still being done, only it's being called "statistical learning theory." When we say that a learning algorithm is "probably approximately correct", for instance, we mean that, with arbitrarily high confidence, using the rule the algorithm gives us to make decisions about new data will have a risk arbitrarily close to the best possible rule, no matter what the distribution of the data. (The importance of minimizing regret becomes even clearer in the online-learning literature.) Trading exact optimality for practical non-parametrics seems like a reasonable deal to me. In any case, it would be very cool if anything could be done with this connection.

The book should be accessible to anyone with a working knowledge of probability and statistics, including linear regression; it's largely self-contained beyond that, and the writing is quite clear, and technicalities studiously avoided. (I particularly found his discussion of identification in linear simultaneous equation models much better than the usual treatment in econometrics books.) While the numerical examples are more detailed than I really needed, they also seem like they would actually help many readers.

[1]: Aside/quibble: Manski is a bit unfair to Herbert Simon in his last chapter. (Of course, I am not exactly objective where the latter is concerned.) Simon's founding papers on bounded rationality were essentially making computational complexity arguments, i.e., mathematical arguments that we cannot, and therefore do not, act like expected utility maximizers in any non-trivial setting. This was followed by an extensive series of experimental investigations into how we make decisions, not just left hanging there...

368 pp., a few line diagrams, bibliography, index

Probability and Statistics; Economics; Sociology

Currently in print as a hardback, ISBN 978-0-674-02653-7 [Buy from Powell's], US$58

5 August 2009