Bayesianism in Math: No Dice
Attention conservation notice: Sniping at someone else's
constructive attempt to get the philosophy of mathematics to pay more attention
to how mathematicians actually discover stuff, because it uses an idea that
pushes my buttons. Assumes you know measure-theoretic probability without
trying to explain it. Written by someone with absolutely no qualifications in
philosophy, and precious few in mathematics for that matter. Largely drafted
back in 2013, then laid aside. Posted now in lieu of new content.
Wolfgang
points to
an interesting
post
[archived]
at "A Mind for Madness" on using Bayesianism in the philosophy of mathematics,
specifically to give a posterior probability for conjectures (e.g.,
the Riemann
conjecture) given the "evidence" of known results. Wolfgang uses this as a
jumping-off point for looking at whether a Bayesian might slide around the
halting problem and Gödel's theorem, or more exactly whether a Bayesian
with \( N \) internal states can usefully calculate any posterior probabilities
of halting for another Turing machine with \( n < N \) states. (I suspect that
would fail for the same reasons my idea
of using learning theory to do
so fails; it's also related to work by
Aryeh "Absolutely
Regular" Kontorovich
on finite-state estimation, and
even older ideas by the late
great Thomas Cover
and Martin Hellman.)
My own take is different. Knowing how I feel about the idea of using
Bayesianism to give probabilities to theories about the
world, you can imagine that I look on the idea of giving probabilities to
theorems with complete disfavor. And indeed I think it would run into
insuperable trouble for purely internal, mathematical reasons.
Start with what mathematical probability is. The basics of a
probability space are a carrier space \( \Omega \), a \( \sigma \)-field \(
\mathcal{F} \) on \( \Omega \), and a probability measure \( P \) on \(
\mathcal{F} \). The mythology is that God, or Nature, picks a point \( \omega
\in \Omega \), and then what we can resolve or perceive about it is whether \(
\omega \in F \), for each set \( F \in \mathcal{F} \). The probability measure
\( P \) tells us, for each observable event \( F \), what fraction of draws of
\( \omega \) are in \( F \). Let me emphasize that there is nothing about the
Bayes/frequentist dispute involved here; this is just the structure of
measure-theoretic probability, as agreed to by (almost) all parties ever since
Kolmogorov laid it down in 1933 ("Andrei Nikolaevitch said it, I believe it,
and that's that").
To assign probabilities to propositions like the Riemann conjecture, the
points in the base space \( \omega \) would seem to have to be something like
"mathematical worlds", say mathematical models of some axiomatic theory.
That is, selecting an \( \omega \in \Omega \) should determine the truth or
falsity of any given proposition like the fundamental theorem of algebra, the
Riemann conjecture, Fermat's last theorem, etc. There would then seem to be
three cases:
- The worlds in \( \Omega \) conform to different axioms, and so the global
truth or falsity of a proposition like the Riemann conjecture is ambiguous and
undetermined.
- All the worlds \( \Omega \) conform to the same axioms, and the
conjecture, or its negation, is a theorem of those axioms. That is, it is true
( or false) in all models, no matter how the axioms are interpreted, and hence
it has an unambiguous truth value.
- The worlds all conform to the same axioms, but the proposition of interest
is true in some interpretations of the axioms and false in others. Hence the
conjecture has no unambiguous truth value.
Case 0 is boring: we know that different axioms will lead to different results.
Let's concentrate on cases 1 and 2. What do they say about the probability of
a set like \( R = \left\{\omega: \text{Riemann conjecture is true in}\
\omega \right\} \)?
- Case 1: The Conjecture Is a Theorem
- Case 1 is that the conjecture (or its negation) is a theorem of the axioms.
Then the conjecture must be true (or false) in every \( \omega \), so \( P(R) =
0 \) or \( P(R) = 1 \). Either way, there is nothing for a Bayesian to learn.
- The only escape I can see from this has to do with the \( \sigma \)-field \(
\mathcal{F} \). Presumably, in mathematics, this would be something like
"everything easily deducible from the axioms and known propositions",
where we would need to make "easy deduction" precise, perhaps in terms of the
length of proofs. It then could happen that \( R \not\in \mathcal{F} \), i.e.,
the set is not a measurable event. In fact, we can deduce from
Gödel that many such sets are not measurable if we take \( \mathcal{F}
\) to be "is provable from the axioms", so even more must be non-measurable if
we restrict ourselves to not seeing very far beyond the axioms. We could then
bracket the probability of the Riemann conjecture from below, by the
probability of any measurable sub-set (sub-conjecture?), and from above, by the
probability of any measurable super-set. (The "inner" and "outer" measures of
a set come, roughly speaking, from making those bounds as tight as possible.
When they match, the set is measurable.) But even then, every measurable set
has either probability 0 or probability 1, so this doesn't seem very useful.
- (The
poster, hilbertthm90,
suggests bracketing the probability of the conjecture by getting "the most
optimistic person about a conjecture to overestimate the probability and the
most skeptical person to underestimate the probability", but this assumes that
we can have a probability, rather than just inner and outer measures. This is
also a separate question from the need to make up a number for the probability
of known results if the conjecture is false. This is the problem of the
catch-all or unconceived-alternative term, and
it's crippling.)
- Another way to get to the same place is to look carefully at what's meant by
a \( \sigma \)-field. It is a collection of subsets of \( \Omega \) which is
closed under repeating the Boolean operations of set theory, namely
intersection, union and negation, a countable infinity of times. Anything
which can be deduced from the axioms in a countable number of steps is
included. This is a core part of the structure of probability theory; if you
want to get rid of it, you are not talking about what we've understood by
"probability" for a century, but about something else. It is true that some
people would weaken this requirement from a \( \sigma \)-field to just a field
which is closed under a finite number of Boolean operations, but that
would still permit arbitrarily long chains of deduction from axioms. (One then
goes from "countably-additive probability"
to "finitely-additive
probability".) That doesn't change the fact that anything which is
deducible from the axioms in a finite number of steps (i.e., has a finite
proof) would have measure 1.
- Said yet a third way, a Bayesian agent immediately has access
to all
logical consequences of its observations and its prior, including in its
prior any axioms it might hold. Hence to the extent that mathematics is about
finding proofs, the Bayesian agent has no need to do math, it
just knows mathematical truths. The Bayesian agent is thus a very,
very bad formalization of a human mathematician indeed.
- Case 2: The Conjecture Is Not a Theorem
- In this case, the conjecture is true under some models of the axioms but
false in others. We thus can get intermediate probabilities for the
conjecture, \( 0 < P(R) < 1 \). Unfortunately, learning new theorems cannot
change the probability that we assign to the conjecture. This is because
theorems, as seen above, have probability 1, and conditioning on an event of
probability 1 is the same as not conditioning at all.
There are a lot of interesting thoughts in the post about how mathematicians
think, especially how they use analogies to get a sense of which conjectures
are worth exploring, or feel like they are near to provable theorems. (There
is also no mention of Polya: but sic
transit gloria mundi.) It would be very nice to have some formalization
of this, especially if the formalism was both tractable and could improve
practice. But I completely fail to see how Bayesianism could do the job.
That post is based on Corfield's Towards a Philosophy of Real Mathematics, which I have not laid hands on, but which seems, judging from this review, to show more awareness of the difficulties than the post does.
Addendum, August 2021: I have
since tracked down an
electronic copy of Corfield's book. While he has sensible things to say
about the role of conjecture, analogy and "feel" in mathematical discovery,
drawing on Polya, he also straightforwardly disclaims the "logical omniscience"
of the standard Bayesian agent. But he does not explain what formalism he
thinks we should use to replace standard probability theory. (The terms
"countably additive" and "finitely additive" do not appear in the text of the
book, and I'm pretty sure "\( \sigma \)-field" doesn't either, though that's
harder to search for. I might add that Corfield also does nothing to explicate
the carrier space \( \Omega \).) I don't think this is because Corfield isn't
sure about what the right formalism would be; I think he just doesn't
appreciate how much of the usual Bayesian machinery he's proposing to discard.
Mathematics;
Philosophy;
Bayes, anti-Bayes