Mathematical Methods of Statistics

<title>Harald Cram&eacute;r, Mathematical Methods of Statistics</title>

<cite><a href="../">The Bactra Review: Occasional and eclectic book reviews by Cosma Shalizi</a></cite> &nbsp; <strong>77</strong>

<h1>Mathematical Methods of Statistics</h1>

<h2><em>by</em> <a href="../authors.html#harald-cramer">Harald
Cram&eacute;r</a></h2>

Uppsala: Almqvist and Wiksells, 1945.

<hr>

This was the first textbook on modern mathematical statistics, and still one of
the best.  It is a monument of the movement between the world wars which
transformed probability theory and statistics into rigorous and powerful
branches of mathematics.  What follows is a brief summary of its contents.

<P>The first part is an introduction --- one of the best I've seen --- to the
theory of integration and measures, assuming no more than a working knowledge
of calculus.  (Set theory is introduced as needed.)  A measure is in essence a
way of assigning a size or weight to sets in a certain space; the integral of a
function, a weighted average of its values, the weights being given by the
measure.  Not all spaces are measurable, nor are all sets within a measurable
space, nor are all functions integrable.  The main technical work is to make as
much measurable, and so integrable, as possible.  This is one of the best
introductory expositions of measure theory I've seen.  (The emphasis is on
Lebesgue integration in <strong>R</strong>^n, but Cram&eacute;r also discusses
somewhat more general integrals, and much more general spaces.)

<P>The classical definition of probability, as found in e.g. Laplace, was as
follows (modulo some anachronistic language on my part).  We start with a set
of distinct outcomes or elementary events, all equally probable.  We then count
up the number which belong to the class of interest A.  The ratio of this
number to the total number of elementary events is the probability of A.  There
are many problems with this definition, circularity not least.  This doesn't
mean it still isn't used by, e.g., physicists, economists and engineers who
know no better.  A superior definition was however provided by the polymathic
Andrei Kolmogorov in 1933; it is the one used by Cram&eacute;r, and by all
other competent authorities.  Namely, we define a probability space as a
measurable space in which the measure satisfies certain requirements, the
``axioms of probability theory'' --- for instance, the measure of the entire
space must be one.  Sets represent different kinds of events; the measure of a
set is the probability that that kind of event will take place.  All the
results of the classical definition are recovered when it is appropriate; the
notion of equiprobable elementary events is banished; and we are allowed to
have probabilities which are not rational numbers.  Its main drawback is that,
while the classical definition only requires that we be able to count, the
modern definition requires that we know measure theory.

<P>Kolmogorov's axioms exhaust the meaning of purely <em>mathematical</em>
probability.  It happens that frequencies in the Realized World lend themselves
to probabilistic treatment, i.e. are well-approximated by mathematical models
satisfying those axioms.  This is why we bother with frequentist, or, as
Cram&eacute;r prefers, ``statistical'' probability.  Why empirical frequencies
should approximate mathematical probabilities, nobody really knows, but it
doesn't seem inherently more problematic than real space's approximating
locally Euclidean geometries.  (Cram&eacute;r does not, if memory serves, make
that comparison.)

<P>It is natural to make a set of numerical values a probability space; the
result is a random variable, ranging over that set.  It is sometimes more
convenient to treat random variables as functions from abstract, amorphous
probability spaces to sets of numbers; no matter of principle is involved.
When the range is a continuum, like <strong>R</strong> or an interval therein,
we can define distribution functions, which tell us how probable various sets
are, and so specify the measure.  Cram&eacute;r goes through a bestiary of
distribution functions (binomial, Gaussian, Poisson, etc.), discussing their
properties and proving results about their manipulation, including both laws of
large numbers and the central limit theorem.  All the usual summary statistics
--- mean, variance, kurtosis, etc. --- are covered, along with many less
well-known numbers, and calculated for most distributions in terms of their
parameters.  This would be a natural place to consider stochastic processes ---
sequences of random variables, or, if you like, random variables ranging over
sequences; for various reasons, however, Cram&eacute;r has very little on them.
This is perhaps the only area where the text is seriously deficient by modern
standards, but there are plenty of good recent books to serve as patches.

<P>The last part of the book is on statistical inference, methods of learning
about distributions from partial data; here he follows the three giants of
modern statistics, R. A. Fisher, Jerzy Neyman and Egon Pearson.  Cram&eacute;r
starts with sampling distributions, i.e. the distributions of small parts
(``samples'') of large populations, assuming the population's distribution to
be known.  (This only <em>seems</em> bass-ackwards.)  From there he goes to
hypothesis testing, considerations of when to reject an idea as too improbable
in the face of the data.  The two errors here, which were <a
href="james-on-errors.html">not clearly distinguished before Neyman and
Pearson</a> wrote in the late 1920s, are rejecting a hypothesis if it's true
(``type I errors''), and accepting it if it's false (type II; the Roman
numerals are <em>de rigeur</em>).  The probability of each kind of risk can be
calculated.  Clearly both should be as small as possible, but (past a certain
point) there is in general a trade-off between the two, which dictates the
design of statistical tests.  (I will go no further into this here, referring
the curious reader directly to Cram&eacute;r's book, and to <a
href="../error/">Deborah Mayo's recent defense of Neyman-Pearson testing.</a>)
Then we turn to parameter estimation --- assuming that the data comes from one
of one member of a parameterized family of distributions, how well can we guess
the parameters from our data?  The essential method, due again to Neyman and
Pearson, is that of confidence regions, a way of saying ``<em>either</em> the
parameters lie in this region, <em>or</em> our data came from an extremely
improbable concatenation of events.''  Of course, as the <a
href="chan-note.html">eminent forensic statistician C. Chan remarked</a>,
``improbable events permit themselves the luxury of occurring''; the less
willing we are to let their luxuries interfere with our estimates, the broader
our confidence regions must be.  Both confidence regions and Neyman-Pearson
hypothesis testing can be wonderfully counter-intuitive, but Cram&eacute;r
explains them clearly and convincingly.  The last sections discuss analysis of
variance and linear regression methods.  All parts of the discussion of
statistical inference are supplemented with real-world examples, leaning
heavily on the (excellent) data provided by the Swedish census.

<P>This book is a classic, not least for its combination of lucidity and rigor.
In recognition of its merits, it is about to be re-issued in an affordable
edition.  It belongs on the shelf of anyone interested in statistical methods.

<hr>xvi + 575 pp., lots of graphs and tables, full bibliographic references,
index

<br><a href="../subjects/probability.html">Probability and Statistics</a>

<br>Re-printed, as vol. 9 of the Princeton Mathematics Series, by Princeton
University Press, 1946; to be issued as a paperback in the Princeton Landmarks
in Mathematics and Physics series, April 1999.  Currently in print as a
hardback, ISBN 0-691-08004-6, US$89.50; paperback edition to be ISBN
0-691-00547-8, US$24.95.  LoC QA276 C72

<hr>30 March 1999; thanks to Tony Lin for directing me to this book.