Attention conservation notice: Clearing out my drafts folder. 600+ words on some examples that I cut from a recent manuscript. Only of interest to (bored) statisticians.

The theme here is to construct some simple yet pointed examples where Bayesian inference goes wrong, though the data-generating processes are well-behaved, and the priors look harmless enough. In reality, however, there is no such thing as an prior without bias, and in these examples the bias is so strong that Bayesian learning reaches absurd conclusions.

The data *X _{i}*,

This figure shows typical sample paths for *z*, for the posterior
probability of the +1 mode, and for the relative entropy of the predictive
distribution from the data-generating distribution. (The latter is calculated
by Monte Carlo since I've forgotten how to integrate, so some of the fuzziness
is MC
noise.) Here is the R
code.

*Exercise 1:* Confirm those calculations for the likelihood ratio and
so for the posterior.

*Exercise 2:* Find the expected log-likelihood of an arbitrary-mean
unit-variance Gaussian under this data-generating distribution.

Keep the same data-generating distribution, but now let the prior be the conjugate prior for a Gaussian, namely another Gaussian, centered at zero. The posterior is then another Gaussian, which is a function of the sample mean, since the latter is a sufficient statistic for the problem.

*Exercise 3:* Find the mean and variance of the posterior
distribution as functions of the sample mean. (You could look them up, but
that would be cheating.)

As we get more and more data, the sample mean of converges almost surely to
zero (by the law of large numbers), which here drives the mean and variance of
the posterior to zero almost surely as well. In other words, the Bayesian
becomes dogmatically certain that the data are distributed according to a
standard Gaussian with mean 0 and variance 1. This is so even though the
sample variance almost surely converges to the true variance, which is 2. This
Bayesian, then, is certain that the data are really not that variable,
and *any time now* will start settling down.

*Exercise 4:* Suppose that we take the prior from the previous
example, set it to 0 on the interval [-1,+1], and increase the prior everywhere
else by a constant factor to keep it normalized. Show that the posterior
density at every point except -1 and +1 will go to zero. (Hint: use exercise 2
and see here.)

**Update** in response to e-mails, 27 March: No, I'm not saying
that actual Bayesian statisticians are this dumb. A sensible practitioner
would, as Andy Gelman
always recommends, run a posterior predictive check, and discover that his
estimated model looks nothing at all like the data. But that sort of thing is
completely outside the formal apparatus of Bayesian inference. What amuses me
in these examples is that the formal machinery becomes so certain while being
so wrong, while starting from the right answer (and this while Theorem 5 from
my paper still applies!). See the second post by Brad DeLong, linked to below.

*Manual trackback*: Brad DeLong; and again Brad DeLong (with a simpler version of example 1!); The Statistical Mechanic

Posted at March 26, 2009 10:45 | permanent link