A family of stochastic models is correctly specified when it includes the true data-generating process; otherwise it is more or less mis-specified. If one is only interested in certain aspects of the data, such as conditional expectations (as opposed to full conditional distributions), those can be correctly specified, i.e., matched, by one's model family even if the whole distribution is not. Many of the ordinary results presented in textbooks of statistical theory and econometrics rely, more or less explicitly, on one's models being correctly specified. If the model-space is parametric, i.e., finite-dimensional, and some not-too-restrictive regularity assumptions hold, then the maximum likelihood estimate is consistent, i.e., converges in probability on the true distribution, and indeed asymptotically the ML-estimated parameters not only have a Gaussian distribution around the true parameters, they converge at least as fast as any other consistent estimator. The asymptotic variance of the MLE depends on the Fisher information, which is both the expected second derivative of the log-likelihood, and the covariance of its vector of first derivatives; that these two expressions coincide is the "information matrix equality". In such a setting, testing (smooth, finite-dimensional) restrictions on the parameters can be done in any of a number of straightforward ways — likelihood ratios, Lagrange multipliers, etc.

There may well have been an occasion, at some point in the history or
future of statistics, where a statistician has written down, or will write, a
completely correct parametric stochastic specification for the problem at hand;
certainly it would be rash to deny the possibility. It would be stupid,
however, to assume that it is at all common, so it would be good to know what
happens when a parametric model is more or less mis-specified, and how one can
tell. This is the subject of White's monograph. The first chapters are a
careful treatment of the asymptotics of maximum likelihood estimation (and
quasi-maximum-likelihood) with more or less dependent and heterogeneous data,
and exogenous variables, emphasizing the role of uniform laws of large numbers.
He then looks at different forms of mis-specification, and how they can lead to
failures of consistency, even when one carefully re-defines "consistency" to be
as accommodating as possible to mis-specified models. (In brief, instead of
converging to the truth, all one asks for is convergence to the best
approximation to the truth, in the sense of relative entropy/Kullback-Leibler
divergence.) Owing to the robustness of the central limit theorem, asymptotic
normality often obtains even in the face of some quite ugly mis-specifications,
but the variance of this distribution is in general *not* given by the
Fisher information, since in general the information matrix equality breaks
down, and the expected Hessian is not the same as the covariance of the score
vector.

Several chapters are devoted to various tests of mis-specification, mostly
checking specific, parameterized violations of assumptions (e.g., are
innovations white noise, or themselves an ARMA process?). Some however are
more general, such as tests of the information matrix equality itself.
(White's discussion of the "encompassing principle" is much clearer than
anything I have read from its advocates; basically, the true model should be
able to predict what parameters will be estimated within other models, but not
vice versa.) I cannot recall any treatment of comparing models which
are *all* mis-specified. There are however some gestures in the
direction of non-parametric or semi-parametric statistical methods. (I am a
bit surprised there is not more on non-parametrics, since White has done
fundamental work on neural networks as universal function approximators.)

The implied reader is already quite familiar with theoretical statistics,
at the level of
Casella
and Berger,
Bierens,
or Gourieroux
and Monfort, as well as conditioning on a filtration and martingales, but
is otherwise fairly naive concerning information theory and stochastic
processes. Each chapter has a mathematical appendix containing the proofs.
Typography is frankly ugly (the book was obvious written in Word), and the
reader is supposed to retain the meaning of hat, asterix, tilde,
subscript *o*, superscript *o*, subscript and
superscript *t* (not the same), dots, double dots, etc., without the
benefit of demarcated definitions or an index of notation. (There is however a
helpful index of assumptions and results.) This is enough of a hassle that
I've put off reading it for several years; this was foolish. The results are
powerful, important to anyone serious about using parametric models for
scientific inference, and hard to find elsewhere. I wish I had forced myself
to tackle this book much earlier.

380 pp., no illustrations bibliography, index of assumptions, index of names and subjects

Probability and Statistics; Economics

Currently in print as a hardback, ISBN 978-0-521-25280-5 [buy from Powell's], US$110, and as a paperback, ISBN 978-0-521-57446-4 [Buy from Powell's], US$39.99.

14 September 2009