Attention conservation notice:Quasi-teaching note giving an economic interpretation of the Neyman-Pearson lemma on statistical hypothesis testing.

Suppose we want to pick out some sort of signal from a background of noise.
As every schoolchild knows, any procedure for doing this,
or **test**, divides the data space into two parts, the one where
it says "noise" and the one where it says "signal".* Tests will make two kinds
of mistakes: they can can take noise to be signal, a **false
alarm**, or can ignore a genuine signal as noise,
a **miss**. Both the signal and the noise are stochastic, or we
can treat them as such anyway. (Any determinism distinguishable from chance is
just insufficiently complicated.) We want tests where
the *probabilities* of both types of errors are small. The probability
of a false alarm is called the **size** (or **significance
level**) of the test; it is the measure of the "say 'signal'" region
under the noise distribution. The probability of a miss, as opposed to a false
alarm, has no short name in the jargon, but one minus the probability of a miss
— the probability of detecting a signal when it's present — is
called **power**.

Suppose we know the probability density of the noise \( p \) and that of the
signal is \( q \). The Neyman-Pearson lemma, as many though not all
schoolchildren know, says that then, among all tests off a given size \( s \) ,
the one with the smallest miss probability, or highest power, has the form "say
'signal' if \( q(x)/p(x) > t(s) \), otherwise say 'noise'," and that the
threshold \( t \) varies inversely with \( s \) . The quantity \( q(x)/p(x) \)
is the **likelihood ratio**; the Neyman-Pearson lemma says that to
maximize power, we should say "signal" if its sufficiently *more likely*
than noise.

The likelihood ratio indicates how different the two distributions —
the two **hypotheses** — are at \( x \), the data-point we
observed. It makes sense that the outcome of the hypothesis test should depend
on this sort of discrepancy between the hypotheses. But why
the *ratio*, rather than, say, the difference \( q(x) - p(x) \), or a
signed squared difference, etc.? Can we make this intuitive?

Start with the fact that we have an optimization problem under a constraint. Call the region where we proclaim "signal" \( R \) . We want to maximize its probability when we are seeing a signal, \( Q(R) \), while constraining the false-alarm probability, \( P(R) = s \) . Lagrange tells us that the way to do this is to maximize \( Q(R) - t[P(R) - s] \) over \( R \) and \( t \) jointly. So far the usual story; the next turn is usually "as you remember from the calculus of variations..."

Rather than actually doing math, let's think like economists. Picking the
set \( R \) gives us a certain benefit, in the form of the power \( Q(R) \) ,
and a cost, \(tP(R) \) . (The \( ts \) term is the same for all \( R \) .)
Economists, of course, tell us to equate *marginal* costs and benefits.
What is the marginal benefit of expanding \( R \) to include a small
neighborhood around the point \( x \) ? Just, by the definition of
"probability density", \( q(x) \) . The marginal cost is likewise \( tp(x) \)
. We should include \( x \) in \( R \) if \( q(x) > tp(x) \), or \( q(x)/p(x)
> t \) . The boundary of \( R \) is where marginal benefit equals marginal
cost, and that is why we need the likelihood *ratio* and not the
likelihood *difference*, or anything else. (Except for a monotone
transformation of the ratio, e.g. the log ratio.) The likelihood ratio
threshold \( t \) is, in fact, the
shadow price of
statistical power.

I am pretty sure I have not seen or heard the Neyman-Pearson lemma explained marginally before, but in retrospect it seems too simple to be new, so pointers would be appreciated.

*Manual trackback*: John Barrdear;
Econometrics Beat;
Steve Laniel

*Updates*: Thanks to David Kane for spotting a typo.

17 January 2012: more and yet more about the Neyman-Pearson lemma.

15 July 2012: fixed a typo which had "minimize" where I meant "maximize".

*: Yes, you could have a randomized test procedure, but the situations where those actually help pretty much define "boring, merely-technical complications."

Posted at November 08, 2009 03:06 | permanent link