The Bactra Review: Occasional and eclectic book reviews by Cosma Shalizi 145

Estimation, Inference, and Specification Analysis

by Halbert White

Cambridge, England: Cambridge University Press, 1994

How to Tell That Your Model Is Wrong; and, What Happens Afterwards

A family of stochastic models is correctly specified when it includes the true data-generating process; otherwise it is more or less mis-specified. If one is only interested in certain aspects of the data, such as conditional expectations (as opposed to full conditional distributions), those can be correctly specified, i.e., matched, by one's model family even if the whole distribution is not. Many of the ordinary results presented in textbooks of statistical theory and econometrics rely, more or less explicitly, on one's models being correctly specified. If the model-space is parametric, i.e., finite-dimensional, and some not-too-restrictive regularity assumptions hold, then the maximum likelihood estimate is consistent, i.e., converges in probability on the true distribution, and indeed asymptotically the ML-estimated parameters not only have a Gaussian distribution around the true parameters, they converge at least as fast as any other consistent estimator. The asymptotic variance of the MLE depends on the Fisher information, which is both the expected second derivative of the log-likelihood, and the covariance of its vector of first derivatives; that these two expressions coincide is the "information matrix equality". In such a setting, testing (smooth, finite-dimensional) restrictions on the parameters can be done in any of a number of straightforward ways — likelihood ratios, Lagrange multipliers, etc.

There may well have been an occasion, at some point in the history or future of statistics, where a statistician has written down, or will write, a completely correct parametric stochastic specification for the problem at hand; certainly it would be rash to deny the possibility. It would be stupid, however, to assume that it is at all common, so it would be good to know what happens when a parametric model is more or less mis-specified, and how one can tell. This is the subject of White's monograph. The first chapters are a careful treatment of the asymptotics of maximum likelihood estimation (and quasi-maximum-likelihood) with more or less dependent and heterogeneous data, and exogenous variables, emphasizing the role of uniform laws of large numbers. He then looks at different forms of mis-specification, and how they can lead to failures of consistency, even when one carefully re-defines "consistency" to be as accommodating as possible to mis-specified models. (In brief, instead of converging to the truth, all one asks for is convergence to the best approximation to the truth, in the sense of relative entropy/Kullback-Leibler divergence.) Owing to the robustness of the central limit theorem, asymptotic normality often obtains even in the face of some quite ugly mis-specifications, but the variance of this distribution is in general not given by the Fisher information, since in general the information matrix equality breaks down, and the expected Hessian is not the same as the covariance of the score vector.

Several chapters are devoted to various tests of mis-specification, mostly checking specific, parameterized violations of assumptions (e.g., are innovations white noise, or themselves an ARMA process?). Some however are more general, such as tests of the information matrix equality itself. (White's discussion of the "encompassing principle" is much clearer than anything I have read from its advocates; basically, the true model should be able to predict what parameters will be estimated within other models, but not vice versa.) I cannot recall any treatment of comparing models which are all mis-specified. There are however some gestures in the direction of non-parametric or semi-parametric statistical methods. (I am a bit surprised there is not more on non-parametrics, since White has done fundamental work on neural networks as universal function approximators.)

The implied reader is already quite familiar with theoretical statistics, at the level of Casella and Berger, Bierens, or Gourieroux and Monfort, as well as conditioning on a filtration and martingales, but is otherwise fairly naive concerning information theory and stochastic processes. Each chapter has a mathematical appendix containing the proofs. Typography is frankly ugly (the book was obvious written in Word), and the reader is supposed to retain the meaning of hat, asterix, tilde, subscript o, superscript o, subscript and superscript t (not the same), dots, double dots, etc., without the benefit of demarcated definitions or an index of notation. (There is however a helpful index of assumptions and results.) This is enough of a hassle that I've put off reading it for several years; this was foolish. The results are powerful, important to anyone serious about using parametric models for scientific inference, and hard to find elsewhere. I wish I had forced myself to tackle this book much earlier.

380 pp., no illustrations bibliography, index of assumptions, index of names and subjects

Probability and Statistics; Economics

Currently in print as a hardback, ISBN 978-0-521-25280-5 [buy from Powell's], US$110, and as a paperback, ISBN 978-0-521-57446-4 [Buy from Powell's], US$39.99.

14 September 2009