The Bactra Review: Occasional and eclectic book reviews by Cosma Shalizi   77

Mathematical Methods of Statistics

by Harald Cramér

Uppsala: Almqvist and Wiksells, 1945.
This was the first textbook on modern mathematical statistics, and still one of the best. It is a monument of the movement between the world wars which transformed probability theory and statistics into rigorous and powerful branches of mathematics. What follows is a brief summary of its contents.

The first part is an introduction --- one of the best I've seen --- to the theory of integration and measures, assuming no more than a working knowledge of calculus. (Set theory is introduced as needed.) A measure is in essence a way of assigning a size or weight to sets in a certain space; the integral of a function, a weighted average of its values, the weights being given by the measure. Not all spaces are measurable, nor are all sets within a measurable space, nor are all functions integrable. The main technical work is to make as much measurable, and so integrable, as possible. This is one of the best introductory expositions of measure theory I've seen. (The emphasis is on Lebesgue integration in R^n, but Cramér also discusses somewhat more general integrals, and much more general spaces.)

The classical definition of probability, as found in e.g. Laplace, was as follows (modulo some anachronistic language on my part). We start with a set of distinct outcomes or elementary events, all equally probable. We then count up the number which belong to the class of interest A. The ratio of this number to the total number of elementary events is the probability of A. There are many problems with this definition, circularity not least. This doesn't mean it still isn't used by, e.g., physicists, economists and engineers who know no better. A superior definition was however provided by the polymathic Andrei Kolmogorov in 1933; it is the one used by Cramér, and by all other competent authorities. Namely, we define a probability space as a measurable space in which the measure satisfies certain requirements, the ``axioms of probability theory'' --- for instance, the measure of the entire space must be one. Sets represent different kinds of events; the measure of a set is the probability that that kind of event will take place. All the results of the classical definition are recovered when it is appropriate; the notion of equiprobable elementary events is banished; and we are allowed to have probabilities which are not rational numbers. Its main drawback is that, while the classical definition only requires that we be able to count, the modern definition requires that we know measure theory.

Kolmogorov's axioms exhaust the meaning of purely mathematical probability. It happens that frequencies in the Realized World lend themselves to probabilistic treatment, i.e. are well-approximated by mathematical models satisfying those axioms. This is why we bother with frequentist, or, as Cramér prefers, ``statistical'' probability. Why empirical frequencies should approximate mathematical probabilities, nobody really knows, but it doesn't seem inherently more problematic than real space's approximating locally Euclidean geometries. (Cramér does not, if memory serves, make that comparison.)

It is natural to make a set of numerical values a probability space; the result is a random variable, ranging over that set. It is sometimes more convenient to treat random variables as functions from abstract, amorphous probability spaces to sets of numbers; no matter of principle is involved. When the range is a continuum, like R or an interval therein, we can define distribution functions, which tell us how probable various sets are, and so specify the measure. Cramér goes through a bestiary of distribution functions (binomial, Gaussian, Poisson, etc.), discussing their properties and proving results about their manipulation, including both laws of large numbers and the central limit theorem. All the usual summary statistics --- mean, variance, kurtosis, etc. --- are covered, along with many less well-known numbers, and calculated for most distributions in terms of their parameters. This would be a natural place to consider stochastic processes --- sequences of random variables, or, if you like, random variables ranging over sequences; for various reasons, however, Cramér has very little on them. This is perhaps the only area where the text is seriously deficient by modern standards, but there are plenty of good recent books to serve as patches.

The last part of the book is on statistical inference, methods of learning about distributions from partial data; here he follows the three giants of modern statistics, R. A. Fisher, Jerzy Neyman and Egon Pearson. Cramér starts with sampling distributions, i.e. the distributions of small parts (``samples'') of large populations, assuming the population's distribution to be known. (This only seems bass-ackwards.) From there he goes to hypothesis testing, considerations of when to reject an idea as too improbable in the face of the data. The two errors here, which were not clearly distinguished before Neyman and Pearson wrote in the late 1920s, are rejecting a hypothesis if it's true (``type I errors''), and accepting it if it's false (type II; the Roman numerals are de rigeur). The probability of each kind of risk can be calculated. Clearly both should be as small as possible, but (past a certain point) there is in general a trade-off between the two, which dictates the design of statistical tests. (I will go no further into this here, referring the curious reader directly to Cramér's book, and to Deborah Mayo's recent defense of Neyman-Pearson testing.) Then we turn to parameter estimation --- assuming that the data comes from one of one member of a parameterized family of distributions, how well can we guess the parameters from our data? The essential method, due again to Neyman and Pearson, is that of confidence regions, a way of saying ``either the parameters lie in this region, or our data came from an extremely improbable concatenation of events.'' Of course, as the eminent forensic statistician C. Chan remarked, ``improbable events permit themselves the luxury of occurring''; the less willing we are to let their luxuries interfere with our estimates, the broader our confidence regions must be. Both confidence regions and Neyman-Pearson hypothesis testing can be wonderfully counter-intuitive, but Cramér explains them clearly and convincingly. The last sections discuss analysis of variance and linear regression methods. All parts of the discussion of statistical inference are supplemented with real-world examples, leaning heavily on the (excellent) data provided by the Swedish census.

This book is a classic, not least for its combination of lucidity and rigor. In recognition of its merits, it is about to be re-issued in an affordable edition. It belongs on the shelf of anyone interested in statistical methods.

xvi + 575 pp., lots of graphs and tables, full bibliographic references, index
Probability and Statistics
Re-printed, as vol. 9 of the Princeton Mathematics Series, by Princeton University Press, 1946; to be issued as a paperback in the Princeton Landmarks in Mathematics and Physics series, April 1999. Currently in print as a hardback, ISBN 0-691-08004-6, US$89.50; paperback edition to be ISBN 0-691-00547-8, US$24.95. LoC QA276 C72
30 March 1999; thanks to Tony Lin for directing me to this book.