One of the ideas in physics which makes no sense to me (no matter how often people much smarter than I am tried convince me) is that statistical mechanics is basically an application of Bayesian ideas about statistical inference. On this view, the probabilities I calculate when I solve a stat. mech. problem --- say, the probability that all the molecules of air in this room will, in the next minute, be found at least three feet above ground level --- are not statements about how often such events occur. Rather, they are statements about the strength of my belief that such things will occur. Thermodynamic entropy, in particular, is supposed to be just the information-theoretic, Shannon entropy in my distribution over molecular states; how much uncertainty I have about the molecular state of whatever it is I'm dealing with.

Here's an (unfair) way of putting it: water boils because I become
sufficiently ignorant of its molecular state. This is a problem, because water
boiled a thousand years ago, when people didn't know it was made of molecules,
and *a fortiori* weren't uncertain about the state of those molecules.
Presumably it boils even when nobody's there to look... The usual dodge is to
say that it's not really *my* uncertainty about the molecular state that
matters, but that of some kind of idealized observer who knows all the relevant
facts about molecules and their behavior, knows what I do about the gross,
macroscopic observables (e.g., thermometer and pressure-gauge readings), and
synthesizes all these data optimally. Generally the last bit means some
combination of Bayes's rule and selecting the distribution with the maximum
possible entropy, subject to constraints from the observations. I don't find
this a persuasive story, for pretty conventional reasons I won't go over here.
(See, e.g., David Albert's Time
and Chance.) I have, however, just found what seems like a new
objection: the ideal observer should think that entropy
doesn't *increase*, so its arrow of time should run *backwards*.
This was sparked by a remark my friend Eric Smith made in a completely
different context, and has somehow grown into a four-page preprint. (Eric
should not be blamed for this in any way.)

"The Backwards Arrow of Time of the Coherently Bayesian Statistical Mechanic", cond-mat/0410063

The story goes like this. Observe your system at time 0, and invoke your
favorite way of going from an observation to a distribution over the system's
states --- say the maximum entropy principle. This distribution will have some
Shannon entropy, which by hypothesis is also the system's thermodynamic
entropy. Assume the system's dynamics are invertible, so that the state at
time *t* determines the states at times *t*+1 and *t*-1.
This will be the case if the system obeys the usual laws of classical
mechanics, for example. Now let your system evolve forward in time for one
time-step. It's a basic fact about invertible dynamics that they leave Shannon
entropy invariant, so it's still got whatever entropy it had when you started.
Now make a new observation. If you update your probability distribution using
Bayes's rule, a basic result in information theory shows that the Shannon
entropy of the posterior distribution is, on average, no more than that of the
prior distribution. There's no way an observation can make you *more*
uncertain about the state *on average*, though particular observations
may be very ambiguous. (Noise-free measurements would let us drop the "on
average" qualifer.) Repeating this, we see that entropy decreases over time
(on average). And so heat flows from cold bodies to warm ones, ice cubes
spontaneously form in glasses of water, sugar cubes crystallize out of cups of
tea, milk unstirs itself from coffee, and corpses sit up and
write learned volumes in a well-ordered script. Q.E.D.

Some people like having probability distributions for things which aren't random variables, but don't want to update them with Bayes's rule. I feel that if you are going to have such awful things, Bayes's rule is the right way to handle them, but I do consider one alternative. This is to pick the distribution which maximizes the Shannon entropy, subject not just to one constraint (from our original measurement) but two (from both measurements). A trick (the Koopman operator) lets us go from from having one constraint at each time to a pair of constraints at a common time, which is easier to handle. This, too, leads to the entropy falling with each observation (and not just on average either).

I can see only three ways out. The obvious one is to give up the
identification of thermodynamic entropy with anyone's uncertainty, including
the ideal observer's. I think that's the right way to go. (It even doesn't
prohibit you from saying *probability* is degree-of-belief.) And, of
course, states of equilibrium are and remain states of maximum entropy; it's
just that that's a fact about the physical world, and not about inductive logic
(or what-not). The others are to abandon the usual laws of motion, or to cook
up some *really* weird form of subjectivist statistical inference. I
don't hold out much hope for either of these, but if I had to chose, the former
sounds more promising. That's because we know that the isolated system obeying
the laws of classical mechanics is just an approximation to an open
quantum-mechanical system, so maybe quantum effects, e.g., environmental
decoherence, will turn out to make the Shannon entropy increase. But I don't
understand quantum statistics well enough to check that. (This looks like a good place to
start.)

I should also mention an interesting tangent. Several people have noticed
that the calculations the Bayesian statistical mechanic has to make are going
to be quite complicated, at least if the dynamics are interestingly irregular.
One might be able to show that an agent which needs predictions sooner rather
than later might be better off ignoring historical data, and making its
predictions as though the latest measurement were the *only* one. The
error due to this approximation would be less costly than the time needed to
make a more accurate calculation. (I suspect mixing will prove to be a
necessary condition for this to hold.) This would not explain why the
thermodynamic entropy should be the one connected to *this* tractable
approximation, of course, so it doesn't resolve the paradox. Alternately,
physical observers would need to store data about earlier observations somehow,
and perhaps one can show (as in the Szilard-Zurek approach to Maxwell's
Demon) that the entropic cost of storing the data more than cancels the
reduction in the system's entropy. But this doesn't keep the *system's*
entropy from falling. Anyway, this returns us to the
water-boils-when-I-grow-ignorant situation.

I'm really not quite sure what to make of this, or where to send it. It's
too short for a philosophy-of-science journal, it has little *physical*
content, and the mathematics is quite basic (which does not mean, of course,
that it's been properly used, especially not in my hands). The only place I
can think of is the "brief reports" section of Physical Review
E --- other suggestions would be welcomed.

*Manual trackback*: the blog formerly known as The Statistical Mechanic

Posted at October 07, 2004 17:45 | permanent link