Attention conservation notice:2100 words on parallels between statistical hypothesis testing and Jamesian pragmatism; an idea I've been toying with for a decade without producing anything decisive or practical. Contains algebraic symbols and long quotations from ancient academic papers. Also some history-of-ideas speculation by someone who is not a historian.

When last we saw the Neyman-Pearson lemma, we were
looking at how to tell whether a data set *x* was signal or noise,
assuming that we know the statistical distributions of noise (call it *p*)
and the distribution of signals (*q*). There are two kinds of mistake we
can make here: a false alarm, saying "signal" when *x* is really noise,
and a miss, saying "noise" when *x* is really signal. What Neyman and
Pearson showed is that if we fix on a false alarm rate we can live with (a
probability of mistaking noise for signal; the "significance level"), there is
a unique optimal test which minimizes the probability of misses --- which
maximizes the **power** to detect signal when it is present. This
is the likelihood ratio test, where we say "signal" if and only
if *q*(*x*)/*p*(*x*) exceeds a certain threshold picked to
control the false alarm rate.

The Neyman-Pearson lemma comes from
their 1933 paper; but the
distinction between the two kinds of errors, which is clearly more fundamental.
Where does *it* come from?

The first place Neyman and/or Pearson use it, that I can see, is their 1928 paper (in two parts), where it's introduced early and without any fanfare. I'll quote it, but with some violence to their notation, and omitting footnoted asides (from p. 177 of part I; "Hypothesis A" is what I'm calling "noise"):

Setting aside the possibility that the sampling has not been random or that the population has changed during its course,The 1928 paper goes on to say that, intuitively, it stands to reason that the likelihood ratio is the right way to accomplish this. The point of the 1933 paper is to more rigorously justify the use of the likelihood ratio (hence the famous "lemma", which is really not set off as a separate lemma...). Before unleashing the calculus of variations, however, they warm up with some more justification (pp. 295--296 of their 1933):xmust either have been drawn randomly frompor fromq, where the latter is some other population which may have any one of an infinite variety of forms differing only slightly or very greatly fromp. The nature of the problem is such that it is impossible to find criteria which will distinguish exactly between these alternatives, and whatever method we adopt two sources of error must arise:

- Sometimes, when Hypothesis A is rejected,
xwill in fact have been drawn fromp.- More often, in accepting Hypothesis A,
xwill really have been drawn fromq.In the long run of statistical experience the frequency of the first source of error (or in a single instance its probability) can be controlled by choosing as a discriminating contour, one outside which the frequency of occurrence of samples from

pis very small — say, 5 in 100 or 5 in 1000. In the density space such a contour will include almost the whole weight of the field. Clearly there will be an infinite variety of systems from which it is possible to choose a contour satisfying such a condition....The second source of error is more difficult to control, but if wrong judgments cannot be avoided, their seriousness will at any rate be diminished if on the whole Hypothesis A is wrongly accepted only in cases where the true sampled population,

q, differs but slightly fromp.

Let us now for a moment consider the form in which judgments are made in practical experience. We may accept or we may reject a hypothesis with varying degrees of confidence; or we may decide to remain in doubt. But whatever conclusion is reached the following position must be recognized. If we reject H(Neither Laplace nor LAPLACE, are mentioned in their 1928 paper.)_{0}, we may reject it when it is true; if we accept H_{0}, we may be accepting it when it is false, that is to say, when really some alternative H_{t}is true. These two sources of error can rarely be eliminated completely; in some cases it will be more important to avoid the first, in others the second. We are reminded of the old problem considered by LAPLACE of the number of votes in a court of judges that should be needed to convict a prisoner. Is it more serious to convict an innocent man or to acquit a guilty? That will depend upon the consequences of the error; is the punishment death or fine; what is the danger to the community of released criminals; what are the current ethical views on punishment? From the point of view of mathematical theory all that we can do is to show how the risk of the errors may be controlled and minimised. The use of these statistical tools in any given case, in determining just how the balance should be struck, must be left to the investigator.

Let's step back a little bit to consider the broader picture here. We have
a question about what the world is like --- which of several conceivable
hypotheses is true. Some hypotheses are ruled out on *a priori*
grounds, others because they are incompatible with evidence, but that still
leaves more than one admissible hypothesis, and the evidence we have does
not *conclusively* favor any of them. Nonetheless, we must chose one
hypothesis for purposes of action; at the very least we will act *as
though* one of them is true. But we may err just as much through rejecting
a truth as through accepting a falsehood. The two errors are symmetric, but
they are not the same error. In this situation, we are advised to pick a
hypothesis based, in part, on which error has graver consequences.

This is *precisely* the set-up of William James's "The Will to
Believe". (It's
easily accessible
online, as are summaries and interpretations; for instance,
an application
to current controversies by Jessa
Crispin.) In particular, James lays great stress on the fact that what
statisticians now call Type I and Type II errors are *both* errors:

There are two ways of looking at our duty in the matter of opinion, — ways entirely different, and yet ways about whose difference the theory of knowledge seems hitherto to have shown very little concern.We must know the truth; and we must avoid error,— these are our first and great commandments as would-be knowers; but they are not two ways of stating an identical commandment, they are two separable laws. Although it may indeed happen that when we believe the truth A, we escape as an incidental consequence from believing the falsehood B, it hardly ever happens that by merely disbelieving B we necessarily believe A. We may in escaping B fall into believing other falsehoods, C or D, just as bad as B; or we may escape B by not believing anything at all, not even A.Believe truth! Shun error! — these, we see, are two materially different laws; and by choosing between them we may end by coloring differently our whole intellectual life. We may regard the chase for truth as paramount, and the avoidance of error as secondary; or we may, on the other hand, treat the avoidance of error as more imperative, and let truth take its chance. Clifford ... exhorts us to the latter course. Believe nothing, he tells us, keep your mind in suspense forever, rather than by closing it on insufficient evidence incur the awful risk of believing lies. You, on the other hand, may think that the risk of being in error is a very small matter when compared with the blessings of real knowledge, and be ready to be duped many times in your investigation rather than postpone indefinitely the chance of guessing true. I myself find it impossible to go with Clifford. We must remember that these feelings of our duty about either truth or error are in any case only expressions of our passional life. Biologically considered, our minds are as ready to grind out falsehood as veracity, and he who says, "Better go without belief forever than believe a lie!" merely shows his own preponderant private horror of becoming a dupe. He may be critical of many of his desires and fears, but this fear he slavishly obeys. He cannot imagine any one questioning its binding force. For my own part, I have also a horror of being duped; but I can believe tbat worse things tban being doped may happen to a man in this world: so Clifford's exhortation has to my ears a thoroughly fantastic sound. It is like a general informing his soldiers that it is better to keep out of battle forever than to risk a single wound. Not so are victories either over enemies or over nature gained. Our errors are surely not such awfully solemn things. In a world where we are so certain to incur them in spite of all our caution, a certain lightness of heart seems healthier than this excessive nervousness on their behalf. At any rate, it seems the fittest thing for the empiricist philosopher.

From here the path to James's will to believe is pretty clear, at least in the form he advocated it, which is that of picking among hypotheses which are all "live"*, and where some choice must be made among them. What I am interested in, however, is not the use James made of this distinction, but simply the fact that he made it.

So far as I have been able to learn, no one drew this distinction between
seeking truth and avoiding error before James, or if they did, they didn't make
anything of it. (Even for Pascal in his wager, the idea that believing in
Catholicism if it is false might be *bad* doesn't register.) Yet this
is just what Neyman and Pearson were getting at, thirty-odd years later. There
is no mention of James in these papers, or indeed of any other source. They
present the distinction as though it were obvious, though eight decades of
subsequent teaching experience shows it is anything but. Neyman and Pearson
were very interested in the foundations of statistics, but seem to have paid no
attention to earlier philosophers, except for the arguable case of Pearson's
father Karl and
his Grammar of
Science (which does not seem to mention James). Yet there it is.
It really looks like two independent inventions of the whole scheme for judging
hypotheses.

My prejudices being what they are, I am much less inclined to think that
James illuminates Neyman and Pearson than the other way around. James was, so
to speak, arguing that we should trade significance — the risk of
mistaking noise for signal — for power, finding some meaningful signal in
what he elsewhere called the "blind molecular chaos" of the physical universe.
Granting that there *is* a trade-off here, however, one has to wonder
about how stark it really is (cf.), and whether his
will-to-believe is really the best way to handle it. Neyman and Pearson
suggest we should look for a procedure for resolving metaphysical questions
which maximizes the ability to detect larger meanings for a given risk of
seeing faces in clouds — and would let James and Clifford set their
tolerance for that risk to their own satisfaction. Of course, any such
procedure would have to squarely confront the fact that there may be no way of
maximizing power against *multiple* alternatives simultaneously...

The extension to confidence sets, consisting of all hypotheses not rejected by suitably powerful tests (per Neyman 1937) is left as an exercise to the reader.

*: As an example of a "dead" hypothesis, James gives believing in "the Mahdi", presumably Muhammad Ahmad ibn as-Sayyid Abd Allah. I'm not a Muslim, and those of my ancestors who were certainly weren't Mahdists, but this was still a "What do you mean 'we', white man?" moment in my first reading of the essay. To be fair, James gives me many fewer such moments than most of his contemporaries.

*Manual trackback:* Brad DeLong; Robo;
paperpools (I am not worthy!)

Posted at December 28, 2009 00:08 | permanent link