Attention conservation notice: 2300 words of technical, yet pretentious and arrogant, dialogue on a point which came up in a manuscript-in-progress, as well as in my long-procrastinated review of Plight of the Fortune Tellers. Why don't you read that book instead?

**Q**: You really shouldn't write in library books, you know;
and if you do, your marginalia should be more helpful, or less distracting,
than just "wrong wrong wrong!"

**A**: No harm done; my pen and I are both transparent
rhetorical devices. And besides, Rebonato *is* wrong in those passages.

**Q**: Really? Isn't his point that it's absurd to pretend you
could actually estimate a something like a probability of an interest rate jump
so precisely that there's a real difference between calling it 0.500 000 and
calling it 0.499 967? Isn't it yet more absurd to think that you could get the
99.5 percent annual value-at-risk — the amount of money you'd expect to
lose once in two thousand years — down to four significant figures,
from *any* data set, let alone one that covers just five years and so
omits "not only the Black Death, the Thirty Years' War, the Barbarian
invasions, and the fall of the Roman Empire, but even the economic recession of
1991 — the only meaningful recession in the last twenty years" (as of
2006), to say nothing of the "famous corporate loan book crises of the
Paleochristian era" (p. 218)?

**A**: Of course all that's absurd, and Rebonato is right to
call people on it. By the time his book came out it was too late to do much
good, but if people had paid attention to such warnings I dare say we wouldn't
be quite so badly off now, and they had better listen in the future.

**Q**: So what's your problem. Oh, wait, let me guess: you're
upset because Rebonato's a Bayesian, aren't you? Don't bother, I can tell that
that's it. Look, we all know that you've got objections to that approach, but
at this point I'm starting to think that maybe you have *issues*. Isn't
this sort of reflexive hostility towards a whole methodology — something
you must run into every day of work — awkward and uncomfortable?
Embarrassing, even? Have you thought about seeking help?

**A**: Actually, I have a serious point to make here. What
Rebonato wants is entirely right-headed, but it fits very badly with his
Bayesianism, because Bayesian agents are never uncertain about probabilities;
at least, not about the probability of any observable event.

**Q**: But isn't Bayesianism *about* representing
uncertainty, and making decisions under uncertainty?

**A**: Yes, but Bayesian agents never have the kind of
uncertainty that Rebonato (sensibly) thinks people in finance should have.

**Q**: Let me try to pin you down in black and white.
[*Opens notebook*] I have here on one side of the page our old friend,
the well-known the probability space Omega F. Prob. Coming out of it, in the
middle, is a sequence of random
variables *X*_{1}, *X*_{2},
... , *X*_{n}, ... , which have some joint distribution
or other. (And nothing really depends on its being a sequence, I could use a
random field on a network or whatever you like, add in covariates, etc.) On
the other side of the random variables, looking at them, I have a
standard-issue Bayesian agent. The agent has a hypothesis space, each
point *m* of which is a probability distribution for the random
sequence. This hypothesis space is measurable, and the agent also has a
probability measure, a.k.a. prior distribution, on this space. The agent uses
Bayes's rule to update the distribution by conditioning, so it has a sequence
of measures *D*_{0}, *D*_{1}, etc.

**A**: I think you are missing an "As you know, Bob", but yes,
this is the set-up I have in mind.

**Q**: Now I pick my favorite observable event *f*, a
set in the joint sigma-field of the *X*_{i}. For each
hypothesis
*m*, the probability
*m*(*f*) is well-defined. The Bayesian thinks this is a random
variable *M*(*f*), since it has a distribution *D* on the
hypothesis space. How is that
*not* being uncertain about the probability of *f*?

**A**: Well, in the first place —

**Q**: I am not interested in quibbles about *D* being a Dirac
delta function.

**A**: Fine, assume that *D* doesn't put unit mass on
any single hypothesis, and that it gives non-zero weight to hypotheses with
different values of *m*(*f*). But remember how Bayesian updating
works: The Bayesian, by definition, believes in a joint distribution of the
random sequence *X* and of the hypothesis *M*. (Otherwise,
Bayes's rule makes no sense.) This means that by integrating over *M*,
we get an unconditional, marginal probability for *f*:

P_{n}(f) =E_{Dn}[M(f|X_{1}=x_{1},X_{2}=x_{2}, ... ,X_{n}=x_{n})]

**Q**: Wait, isn't that the denominator in Bayes's rule?

**A**: Not quite, that equation defines a measure — the
predictive distribution — and the denominator in Bayes's rule is
the density
of that measure (with *n*=0) at the observed sequence.

**Q**: Oh, right, go on.

**A**: As an expectation
value, *P*_{n}(*f*) is a completely precise
number. The Bayesian has no uncertainty whatsoever in the probabilities it
gives to anything observable.

**Q**: But won't those probabilities change over time, as it
gets new data?

**A**: Yes, but this just means that the random variables
aren't independent (under the Bayesian's distribution over observables).
Integrating *m* with respect to the prior
*D*_{0} gives us the infinite-dimensional distribution of a
stochastic process, one which is not (in general) equal to any particular
hypothesis, though of course it lies in their convex hull; the simple
hypotheses
are extremal
points. If the individual hypothesis are (laws of) independent,
identically-distributed random sequences, their mixture will
be exchangeable.
If the individual hypotheses are ergodic, their mixture will be asymptotically
mean-stationary.

**Q**: Don't you mean "stationary" rather than "asymptotically mean-stationary"?

**A**: No; see chapter
25 here, or better
yet that
trifler's authority.

**Q**: You were saying.

**A**: Right. The Bayesian integrates out *m* and gets
a stochastic process where the *X*_{i} are dependent.
As far as anything observable goes, the Bayesian's predictions, and therefore
its actions, are those of an agent which treats this stochastic process as
certainly correct.

**Q**: What happens if the Bayesian agent uses some kind of
hierarchical model, or the individual hypotheses are themselves
exchangeable/stationary?

**A**: The only thing that would change, for these purposes, is
the exact process the Bayesian is committed to. Convex mixtures of convex
mixtures of points in *C* are convex mixtures of points in *C*.

**Q**: So to sum up, you're saying that the Bayesian agent is
uncertain about the truth of the unobservable hypotheses (that's their
posterior distribution), and uncertain about *exactly* which observable
events will happen (that's their predictive distribution), but *not*
uncertain about the probabilities of observables.

**A**: Right. (Some other time I'll
explain how that helps make Bayesian models testable.) And — here's
where we get back to Rebonato — all the things he is worried about, like
values-at-risk and so forth, are probabilities of observable events. Put a
Bayesian agent in the risk-modeling situation he talks about, and it won't just
say that the 99.5% VaR is 109.7 million euros rather than 110 million, it will
give you as many significant digits as you have time for.

**Q**: So let me read you something form p. 194--195:

Once frequentists accept (at a given statistical level of confidence) the point estimate of a quantity (say, a percentile), they tend to act as if the estimated number werethetrue value of the parameter. Remember that, for a frequentist, a coin cannot have a 40% chance of being biased. Either the coin is fair or it is biased. Either we are in a recession or we are not. We simply accept or reject these black-or-white statements at a certain confidence level... A Bayesian approach automatically tells us that a parameter (say, a percentile) has a whole distribution of possible values attached to it, and that extracting a single number out of this distribution (as I suggested above, the average, the median, the mode, or whatever) is a possibly sensible, but always arbitrary, procedure. No single number distilled from the posterior distribution is aprimus inter pares: only the full posterior distribution enjoys this privileged status, and it is our choice what use to make of it.

This seems entirely reasonable; where do you think it goes wrong?

**A**: You mean, other than the fact that *point*
estimates do not have "statistical levels of confidence", and that Rebonato has
apparently forgotten about actual confidence *intervals*?

**Q**: Let's come back to that.

**A**: He is running together parameters of the unobserved
hypotheses, and the properties of the predictive distribution on which the
Bayesian acts. I can take any function I like of the
hypothesis, *g*(*m*) say, and use it as a parameter of the
distribution. If I have enough parameters *g*_{i} and
they're (algebraically) independent of each other, there's a 1-1 map between
hypotheses and parameter vectors — parameter vectors are unique names for
hypotheses. I could make parts of those names be readily-interpretable aspects
of the hypothetical distributions, like various percentiles or biases. The
distribution over hypotheses then gives me a distribution over
percentiles *conditional on* the hypothesis *M*. But we don't
know the true hypothesis, and on the next page Rebonato goes on to cast
"ontological" doubt about whether it even exists. (How he can
be *uncertain* about the state of something he thinks doesn't exist is a
nice question.) We only have the earlier observations, so we need to integrate
or marginalize out *M*, and this collapses the distribution of
percentiles down to a single exact value for that percentile.

**Q**: Couldn't we avoid that integration somehow?

**A**: Integrating over the posterior distribution is the
whole *point* of Bayesian decision theory.

**Q**: Let's go back to the VaR example. If you try estimating
the size of once-in-two-thousand-year losses from five years of data, your
posterior distribution has got to be pretty diffuse.

**A**: Actually, it can be arbitrarily concentrated by picking the right prior.

**Q**: Fine, for any *reasonable* prior it needs to be
pretty diffuse. Shouldn't the Bayesian agent be able to use this information
to avoid recklessness?

**A**: That depends on the loss function. If the loss involves
which hypothesis happens to be true, sure, it'll make a difference. (That's
how we get the classic proof that if the loss is the squared difference between
the true parameter and the point estimate, the best decision is the posterior
mean.) But if the loss function just involves what observable events actually
take place, then no. Or, more exactly, it might *make sense* to show
more caution if your posterior distribution is very diffuse, but that's not
actually licensed by Bayesian decision theory; it is "irrational" and sets you
up for a Dutch Book.

**Q**: Should I be worried about having a Dutch Book made
against me?

**A**: I
can't see why, but some people seem to find the prospect worrying.

**Q**: So what should people do?

**A**: I wish I had a good answer. Many of Rebonato's actual
suggestions — things like looking at a range of scenarios, robust
strategies, not treating VaR as the only thing you need, etc. — make a
lot of sense. (When he is making these practical recommendations, he does not
counsel people to engage in a careful quantitative elicitation of their
subjective prior probabilities, and then calculate posterior distributions via
Bayes's rule; I wonder why.) I would also add that there *are* such
things as confidence intervals, which do let you make probabilistic guarantees
about parameters.

**Q**: What on Earth do you mean by a "probabilistic
guarantee"?

**A**: That either the right value of the parameter is in the
confidence set,
*or* you get very unlucky with the data (how unlucky depends on the
confidence level), *or* the model is
wrong. Unlike
coherence, coverage connects you to reality. This is basically why
Haavelmo told the
econometricians, back in the
day, that they needed confidence intervals, not point estimates.

**Q**: So how did the econometricians come to make fetishes of
unbiased point-estimators and significance tests of equality constraints?

**A**: No doubt for the same reason they became convinced that
linear and logistic regression was all you'd ever need to deal with any
empirical data ever.

**Q**: *Any*way, that "get the model right" part seems
pretty tricky.

**A**: *Everyone* is going to have to deal with that.
(You certainly still have to worry about mis-specification
with Bayesian updating.) You can
test your modeling
assumptions, and you can weaken them so you are less susceptible to
mis-specification.

**Q**: Don't you get weaker conclusions — in this case,
bigger confidence intervals — from weaker modeling assumptions?

**A**: That's an unavoidable trade-off, and it's certainly not
evaded by going Bayesian (as Rebonato knows full well). With very weak, and
therefore very defensible, modeling assumptions, the confidence interval on,
say, the 99.5% VaR may be so broad that you can't devise any sensible strategy
which copes with that whole range of uncertainty, but that's the math's way of
telling you that you don't have enough data, and enough understanding of the
data, to talk about once-in-two-thousand-year events. I suppose that, if they
have financial engineers in
the stationary
state, they might eventually be able to look back on enough
sufficiently-converged data to do something at the 99% or even 99.5% level.

**Q**: Wait, doesn't that suggest that there is a *much
bigger* problem with all of this? The economy is non-stationary, right?

**A**: Sure looks like it.

**Q**: So how can we use statistical models to forecast it?

**A**: If you want someone to solve the problem of induction,
the philosophy department is down the stairs and to the left.

*Manual trackback*: Quomodocumque;
Robert Waldmann at Angry Bear;
Brad DeLong

Bayes, anti-Bayes; Enigmas of Chance; The Dismal Science; Dialogues

Posted at June 16, 2009 09:50 | permanent link