The Bactra Review: Occasional and eclectic book reviews by Cosma Shalizi   118

Physics from Fisher Information

A Unification

by B. Roy Frieden

Cambridge University Press, 1998

Laboring to Bring Forth a Mouse

Fisher information is a quantity devised by the great statistician R. A. Fisher in the 1920s, which is supposed to tell us how easy it is to learn about a probability distribution by sampling from it. Suppose that the distribution, given by a probability density $p$, depends on some parameter, traditionally represented by the Greek letter theta, and here by t. Then the Fisher information, with respect to t, is the mean value of the square of the ratio between the derivative of p with respect to t and p itself --- I = <(dp(x;t)/dt 1/p(x;t))^2>, in the rather clunky symbols HTML allows. The importance of this is that the variance in an unbiased estimate of t is always at least 1/I, so that, as the Fisher information grows, it becomes possible to make more precise and accurate estimates from data. In the general case where there are many parameters, the Cramér-Rao Inequality tells us how to put limits on the variances and covariances of their estimates using a generalization of I called the Fisher information matrix.

Now, these ideas have an obvious role to play in physics, but it's the same role they play in any science which uses parameter estimation. Knowing something about the limits of statistical inference is often useful --- constant readers may recall that the Fisher information matrix plays a leading role in Optimum Experimental Designs --- but this is, so to speak, external to the content of the science, whether that be physics, geology or mycology. What Frieden claims here, and a series of related articles published in the physics journals, is that Fisher information is actually connected to physics in the most profound way, that you can derive physical laws by manipulating Fisher information.

In Frieden's vision, physics is entirely about measurements, and he supposes that the measurements must be like trying to estimate a parameter of a distribution. When we think we are measuring (say) the space-time coordinates of an electron, what we record is the true coordinates plus noise, and from this we try to get the true position, which is the underlying unknown parameter. (On this basis he replaces the conventional Fisher formula with <(dp(x)/dx 1/p(x))^2>.) The way Frieden initially states his program is to ask what dependence of the distribution on these parameters will maximize the Fisher information, subject to certain constraints. The solution to such a problem is generally given by a second-order differential equation --- and physical laws are generally second-order differential equations.

I need to say something here about the orthodox ideas about physical dynamics --- "mechanics," in the jargon. In classical and quantum mechanics, equations of motion are derived from what is called the principle of least action. (The "least" is a bit misleading, as we'll see.) "Action" is a term of our art, meaning a quantity having dimensions of energy integrated over time. The energy in question here is what is called the "Lagrangian," generally equal to the kinetic energy of the system minus its potential energy. In the simplest case, of a single particle moving under the influence of external forces, we say that we know the particle will, at one time, be at a certain position and moving with a certain velocity, and at a second, fixed time will have another position and velocity. We then ask what trajectory, connecting these two points, will minimize the action, and claim the particle will follow that trajectory. If we have multiple particles, fields, and interactions between them, the math becomes more complicated, but the principle does not.

A number of semi-technical points ought to be brought out here. The first is that we don't really look for the trajectory of least action; instead we look for the one where slight changes in the trajectory have the least effect on the total action. The favored trajectory is the one of "stationary variation" in the action, whether this be a minimum, a maximum, an inflection point or a saddle (there are textbook cases of all of these). (Hence this is a "variational problem," and analyzed by the "calculus of variations"; hence also why "principle of least action" is a misleading name.) The solution to such a variational problem is given by an equation involving various derivatives of the Lagrangian. Second, we know (since Newton) that physically realistic trajectories are solutions to second-order differential equations. (In effect, this is what Newton's first two laws of motion tell us.) To ensure that the trajectory we get from the variational problem is also a second-order differential equation, we make sure that the Lagrangian includes the square of a first derivative --- which is to say, the kinetic energy. Third, the potential energy term in the Lagrangian, while essential to deriving any behavior more interesting than uniform motion along straight lines, is itself arrived at through tradition, analogy and guess-work. Finally, we require that the Lagrangian as a whole have certain symmetry properties --- that it be invariant under certain kinds of changes in our coordinate system or the physical system, which we figure shouldn't make any difference. Noether's theorem then connects the Lagrangian's symmetries to the physical conservation laws.

We are now in a position to begin to see why Frieden's program is nowhere near as impressive as it first sounds. He doesn't really maximize Fisher information; he simply requires that its variation be stationary. Worse yet, he is admirably candid about the fact that simply doing this doesn't give us any very interesting equation of motion. To get that, he subtracts from the Fisher information a new quantity of his own devising, the "bound information," and requires that the difference between these two, which he calls the "physical information," have stationary variation. Now, while he might have plausibly argued that the "correct" physical variables are the most informative ones, I simply cannot see any reason why his physical information should be maximized. (Note however that unlike a Lagrangian, Fisher information is generally not invariant under change of coordinates, e.g. from Cartesian to spherical, so I'd have liked some reassurance on this point, which is not forthcoming; Frieden evidently believes that Nature thinks in Cartesian coordinates.) He tries to justify his "extremal physical information principle" (pp. 79--82) by saying that physicists are in a non-cooperative game with Nature, trying to seize as much data as we can from Her, and the upshot of this is that physical information should have stationary variation. I couldn't say why he thinks this should convince anyone not raised on the lumpenfeminist idea that modern science is a way of raping and torturing Nature.

In any case, adding bound information (or rather, subtracting it off) reduces the scheme to vacuity. Frieden pulls these terms from out of, to put it politely, the air, and they seem to have no independent significance whatsoever. They are simply whatever he needs to get the equation he wants at the end of the variational problem, subject only to the (really rather mild) constraint that they have the right symmetry properties.

In short, if there is any superiority to dealing with Frieden's physical information rather than with the action, he hasn't demonstrated it. Both get the necessary second-order differential equations by sticking in a squared derivative term --- the kinetic energy for Lagrangians, Fisher information for Frieden. Both involve a more or less ad hoc second term, respectively the potential energy and the bound information, to get the right sort of dynamics. Both do not actually guarantee an extremum, merely stationary variation. They may well be equivalent, in the sense that for every physically important Lagrangian, Frieden can come up with a bound information term which delivers the same equations of motion. On what basis, then, could we choose between the schemes?

  1. Empirical success. Frieden's scheme might lead to better predictions than any available Lagrangian. But the vast majority of the book is devoted to show that it can give the same results as established theories, simply because they are so successful. The one place where he claims to put forward a new theory, or parts of one, is in quantum gravity, which is about as far removed from data as it is possible to get in physics without slipping over the border into mathematics.
  2. Ease of formal manipulation. This would actually be a weighty consideration in his favor, but his scheme is no easier than the standard one; it doesn't seem to be worse, either, however.
  3. Retention of existing theory. Two important aspects of orthodox mechanics are Noether's theorem and Hamiltonians. The former, as mentioned, gives us a way of deriving conservation laws (of energy, momentum, angular momentum, etc.) from the symmetry properties of the Lagrangian. Hamiltonian dynamics is too involved to explain here, but basically gives another way of formulating the least-action principle for conservative systems, and is extremely important in quantum and statistical mechanics, at least as those are conventionally understood. Frieden says nothing about Noether's theorem or conserved quantities --- I'd be willing to bet, very modestly, that his scheme does not allow for them, but I wouldn't be shocked if they could be imported. On the other hand, he deliberately avoids Hamiltonians.
  4. Conceptual unification within physics. Here again we have at best a wash; Frieden claims unification on the grounds that all measurements are uncertain, but one can equally well claim unification on the grounds that all dynamics are about energy and gauge fields (see Lawrie's A Grand Unified Tour of Theoretical Physics). Often (e.g., deriving the form of the Lorentz transformation), what Frieden presents as a unification looks like a lot of labor to get results that are perfectly straight-forward without his extra apparatus.
  5. Conceptual unification with statistics. This would be compelling if Frieden had been able to stick to pure Fisher information; but he can't, and his "physical information" plays no role in statistics.
  6. Metaphysical considerations. I have already mentioned the idea that we struggle against Nature for data; but in addition to that odd mix of game theory and gnosticism, Frieden proposes a cosmology where observers create their own "local reality" through measurement, and an epistemology where physics (and presumably all science) is about measurements and measurements alone, and measurement is intrinsically stochastic.

    The first point can, I dare say, be dismissed at once. The prospect of solipsism, like that of suicide, may help us over some of life's rough patches, but it's hardly something which can be established in a book on mathematical physics! But maybe we shouldn't hold this against Frieden's scheme, since his formalism doesn't employ observers, reality-creating or otherwise.

    As to the second point, that physics should be a science of measurements, I have two objections. The first is that lots of physics deals with things which we don't measure (velocity at most points in a fluid, for instance), or even which we cannot measure (e.g., the interior of a star a billion years ago). Measurements give us our evidence about these matters, but they don't constitute them. The second objection is that what Frieden, following the rest of our profession, calls a "measurement" is, on the basis of our own best theories, really an immensely complicated process. If he really wants to look at data, at what's "given" to us, he shouldn't be thinking about the position of an electron at all, but about gauges, pointers, LED displays, and so on. Even that is being generous: transient colored blobs in his visual field is more like it.

    Finally, I have no objection to the notion that dynamics are fundamentally stochastic. But most physicists have accepted this since the 1920s (if not earlier), so this is hardly a selling-point for Frieden's particular scheme. I do note that he has nothing to say about why really fundamental physical theory, in quantum mechanics, should contain a very odd sort of stochasticity, where instead of decent, real-valued probabilities, we have perverse, complex-valued probability amplitudes, whereas that is the sort of thing I'd expect a real unification with statistics would worry over.

To sum up: Frieden's scheme is at best mathematically equivalent to orthodoxy; it adds nothing empirical; places fundamental and useful concepts in doubt; does nothing to unify physics either internally or with statistics; and it is associated with some really bad metaphysics, though that last perhaps reflects more on Frieden than on the scheme itself. I see absolutely no reason to prefer this scheme to conventional mechanics, rather the reverse. This is at best an extended mathematical curiosity.

To follow this book you need to know the variational formulation of classical and quantum mechanics at, say, the level of Goldstein, and some prior acquaintance with estimation theory would help. Even then, your time would be better spent reading Greg Egan's science fiction, since he deals with many of the same themes, only more convincingly and with far greater sophistication.

ix + 318 pp., bibliography, index, numerous graphs, pencil sketches by the author of J. A. Wheeler, R. A. Fisher, L. Brillouin, and R. B. Frieden.
Physics / Probability and Statistics
Currently in print as a hardback, US$74.95, ISBN 0-521-63167-X, LoC QC39 F75
30 May 2000
Thanks to Erik van Nimwegen, and to P.-M. Binder, who probably disagrees with me.