Paradox! Stone-Cold Paradox! Get It Before It's Warm!

One of the ideas in physics which makes no sense to me (no matter how often people much smarter than I am tried convince me) is that statistical mechanics is basically an application of Bayesian ideas about statistical inference. On this view, the probabilities I calculate when I solve a stat. mech. problem --- say, the probability that all the molecules of air in this room will, in the next minute, be found at least three feet above ground level --- are not statements about how often such events occur. Rather, they are statements about the strength of my belief that such things will occur. Thermodynamic entropy, in particular, is supposed to be just the information-theoretic, Shannon entropy in my distribution over molecular states; how much uncertainty I have about the molecular state of whatever it is I'm dealing with.

Here's an (unfair) way of putting it: water boils because I become sufficiently ignorant of its molecular state. This is a problem, because water boiled a thousand years ago, when people didn't know it was made of molecules, and a fortiori weren't uncertain about the state of those molecules. Presumably it boils even when nobody's there to look... The usual dodge is to say that it's not really my uncertainty about the molecular state that matters, but that of some kind of idealized observer who knows all the relevant facts about molecules and their behavior, knows what I do about the gross, macroscopic observables (e.g., thermometer and pressure-gauge readings), and synthesizes all these data optimally. Generally the last bit means some combination of Bayes's rule and selecting the distribution with the maximum possible entropy, subject to constraints from the observations. I don't find this a persuasive story, for pretty conventional reasons I won't go over here. (See, e.g., David Albert's Time and Chance.) I have, however, just found what seems like a new objection: the ideal observer should think that entropy doesn't increase, so its arrow of time should run backwards. This was sparked by a remark my friend Eric Smith made in a completely different context, and has somehow grown into a four-page preprint. (Eric should not be blamed for this in any way.)

"The Backwards Arrow of Time of the Coherently Bayesian Statistical Mechanic", cond-mat/0410063

The story goes like this. Observe your system at time 0, and invoke your favorite way of going from an observation to a distribution over the system's states --- say the maximum entropy principle. This distribution will have some Shannon entropy, which by hypothesis is also the system's thermodynamic entropy. Assume the system's dynamics are invertible, so that the state at time t determines the states at times t+1 and t-1. This will be the case if the system obeys the usual laws of classical mechanics, for example. Now let your system evolve forward in time for one time-step. It's a basic fact about invertible dynamics that they leave Shannon entropy invariant, so it's still got whatever entropy it had when you started. Now make a new observation. If you update your probability distribution using Bayes's rule, a basic result in information theory shows that the Shannon entropy of the posterior distribution is, on average, no more than that of the prior distribution. There's no way an observation can make you more uncertain about the state on average, though particular observations may be very ambiguous. (Noise-free measurements would let us drop the "on average" qualifer.) Repeating this, we see that entropy decreases over time (on average). And so heat flows from cold bodies to warm ones, ice cubes spontaneously form in glasses of water, sugar cubes crystallize out of cups of tea, milk unstirs itself from coffee, and corpses sit up and write learned volumes in a well-ordered script. Q.E.D.

Some people like having probability distributions for things which aren't random variables, but don't want to update them with Bayes's rule. I feel that if you are going to have such awful things, Bayes's rule is the right way to handle them, but I do consider one alternative. This is to pick the distribution which maximizes the Shannon entropy, subject not just to one constraint (from our original measurement) but two (from both measurements). A trick (the Koopman operator) lets us go from from having one constraint at each time to a pair of constraints at a common time, which is easier to handle. This, too, leads to the entropy falling with each observation (and not just on average either).

I can see only three ways out. The obvious one is to give up the identification of thermodynamic entropy with anyone's uncertainty, including the ideal observer's. I think that's the right way to go. (It even doesn't prohibit you from saying probability is degree-of-belief.) And, of course, states of equilibrium are and remain states of maximum entropy; it's just that that's a fact about the physical world, and not about inductive logic (or what-not). The others are to abandon the usual laws of motion, or to cook up some really weird form of subjectivist statistical inference. I don't hold out much hope for either of these, but if I had to chose, the former sounds more promising. That's because we know that the isolated system obeying the laws of classical mechanics is just an approximation to an open quantum-mechanical system, so maybe quantum effects, e.g., environmental decoherence, will turn out to make the Shannon entropy increase. But I don't understand quantum statistics well enough to check that. (This looks like a good place to start.)

I should also mention an interesting tangent. Several people have noticed that the calculations the Bayesian statistical mechanic has to make are going to be quite complicated, at least if the dynamics are interestingly irregular. One might be able to show that an agent which needs predictions sooner rather than later might be better off ignoring historical data, and making its predictions as though the latest measurement were the only one. The error due to this approximation would be less costly than the time needed to make a more accurate calculation. (I suspect mixing will prove to be a necessary condition for this to hold.) This would not explain why the thermodynamic entropy should be the one connected to this tractable approximation, of course, so it doesn't resolve the paradox. Alternately, physical observers would need to store data about earlier observations somehow, and perhaps one can show (as in the Szilard-Zurek approach to Maxwell's Demon) that the entropic cost of storing the data more than cancels the reduction in the system's entropy. But this doesn't keep the system's entropy from falling. Anyway, this returns us to the water-boils-when-I-grow-ignorant situation.

I'm really not quite sure what to make of this, or where to send it. It's too short for a philosophy-of-science journal, it has little physical content, and the mathematics is quite basic (which does not mean, of course, that it's been properly used, especially not in my hands). The only place I can think of is the "brief reports" section of Physical Review E --- other suggestions would be welcomed.

Manual trackback: the blog formerly known as The Statistical Mechanic; Hyporion

Bayes, anti-Bayes; Physics; Enigmas of Chance

Posted at October 07, 2004 17:45 | permanent link

Three-Toed Sloth

October 07, 2004

Paradox! Stone-Cold Paradox! Get It Before It's Warm!