Pete Dunkelberg wrote to tell me that William Dembski, senior fellow at the Discovery Institute, the Mathematical Great White Hope of the "Intelligent Design" school of creationism, had a new pre-print out on information theory. So, for my sins, I downloaded it.

- William A. Dembski, "Information as a Measue of Variation" [PDF link]
*Abstract*: Within information theory, information typically measures the reduction of uncertainty that results from the knowledge that an event has occurred. But what if the item of knowledge learned is not the occurrence of an event but, rather, the change in probability distribution associated with an ensemble of events? This paper takes the usual account of information, which focuses on events, and generalizes it to probability distributions/measures. In so doing, it encourages the assignment of "generalized bits" to arbitrary state transitions of physical systems. In particular, it provides a theoretical framework for characterizing the informational continuity of evolving systems and for rigorously assessing the degree to which such systems exhibit, or fail to exhibit, continuous change.

Having now read this production in both the original (7 July 2004) and lightly revised (23 July 2004) version, my considered judgment is the same as my first reaction: Sweet suffering Jesus.

First, two points for style, and then the substance.

- Ordinary information theory has a perfectly good way of measuring the amount by which we learn from a changing in the distribution over an ensemble, called the Kullback-Leibler divergence, or the relative entropy, or simply the information gain. Dembski ought to know this, because what he talks about as the "reduction in uncertainty that results from the knowledge that an event has occurred" is the information gain in going from the unconditional distribution, to the distribution conditional on the event. Since he's read Cover and Thomas's standard textbook on information theory, and this is made perfectly clear in chapter 2, this should not be an issue.
- Similarly, physicists and dynamical systems theorists have long had absolutely no problem with looking at the informational properties of quite arbitrary dynamics. Dembski is supposedly a mathematician, so I can understand if he finds books like Complexity, Entropy and the Physics of Information, or journals like Open Systems and Information Dynamics, insufficiently rigorous. (He'd be wrong, but that's another story.) But he might have thought to look around the math library and turn up Patrick Billingsley's wonderful 1965 book on Ergodic Theory and Information, or some back issues of Ergodic Theory and Dynamical Systems and Journal of Statistical Physics. (Incidentally, Dembski's "generalized bits" are just bits.)
- The mathematical core of the paper, such as it is, is the definition of an
information measure, which Dembski calls the "variational information", and
whose defining formula is as follows (I've slightly modified his notation,
replacing Greek letters with Roman):
\[
I(Q|P) = \log_2{\int_{\Omega}{{\left(\frac{dQ}{dP}\right)}^2 dP}}
\]
where $P$ is the old or reference measure, and $Q$ the new
measure we get after some change, assumed to be absolutely continuous with
respect to $P$, so that the Radon-Nikodym
derivative $dQ/dP$ is well-defined. If $P$ is the
ordinary uniform or Lesbegue measure, then $dQ/dP$ is just the
probability density of $Q$, usually written
as $q(x)$ .
Now, this is a perfectly respectable generalization of the regular Shannon information, and in fact one with many interesting properties; it will prove very useful in connection with coding theory, hypothesis testing, and the study of dynamical systems. I can say this with complete confidence because this functional is in fact one of the Rényi informations, introduced by Alfred Rényi in a famous 1960 paper, "On Measures of Entropy and Information", in Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol. I, pp. 547--561. (Was Dembski even

*born*in 1960?) In Dembski's notation, the Rényi information of order $a$, for non-negative real $a$ is \[ I_q(\mu_1|\mu_2) = \log_2{\int_{\Omega}{{\left(\frac{d\mu_1}{d\mu_2}\right)}^q d\mu_2}} \] which approaches the Shannon information in the limit as $a$ goes to 1. Dembski's "variational information" is clearly just the special case $a$ = 2. Dembski correctly derives some of the more basic properties of this quantity, which Rényi established for arbitrary $a$ in his original paper. There does not seem to be any new mathematics in this section whatsoever. (Compare this part of his paper with, e.g., Imre Varga and János Pipek, "Rényi entropies characterizing the shape and the extension of the phase space representation of quantum wave functions in disordered systems", Physical Review E**68**(2003): 026202 [link].)One of the best reasons to study these information measures goes roughly as follows. In 1953, the great Soviet probabilist A. I. Khinchin published a list of four reasonable-looking axioms for a measure of information, and proved that the Shannon information was the unique functional satisfying the axioms (up to an over-all multiplicative constant). (I) The information is a functional of the probability distribution (and not of other properties of the ensemble). (II) The information is maximal for the distribution where all events are equally probable. (III) The information is unchanged by enlarging the probability space with events of zero probability. The trickiest one is (IV) If the probability space is divided into two sub-spaces, A and B, the total information is equal to the information content of the marginal distribution of one sub-space, plus the mean information of the conditional distribution of the other sub-space: I(A,B) = I(A) + E[I(B|A)]. (The paper is re-printed in his book on Mathematical Foundations of Information Theory.) If we relax axiom (IV) to require only that I(A,B) = I(A) + I(B) when A and B are statistically independent, then we get a continuous family of solutions, namely the Rényi informations. This, along with their many applications, has lead to a great deal of attention being paid to the Rényi in the information-theory literature. A quite crude search of the abstracts of IEEE Transactions on Information Theory reveals an average of at least five papers a year over the last ten years. It's even introduced, though briefly, in Cover and Thomas's textbook (p. 499). Of particular note is the well-established use of Rényi information in establishing results on the error rates of hypothesis tests, a problem on which Dembski, notoriously, claims to be an expert. (The

*locus classicus*here is Imre Csiszár, "Generalized cutoff rates and Rényi's information measures", IEEE Transactions on Information Theory**41**(1995): 26--34.) In nonlinear dynamics and statistical physics, the Rényi informations play crucial roles in the so-called "thermodynamic formalism", one of the essential tools of the rigorous study of complex systems. See, in particular, the excellent and standard book by Remo Badii and Antonio Politi, Complexity: Hierarchical Structures and Scaling in Physics (reviewed here). Naturally enough, Dembski*also*claims to be an expert on the measurement of complexity. - The so-called "continuity spectrum" seems to be nothing more than a confused (and admittedly conjectural) grope towards the idea of distance and divergence measures on manifolds of probability distributions, a topic well-explored in information geometry, which have perfectly respectable quantum versions (see chapter 7 of Amari and Nagaoka's Methods of Information Geometry, or this paper by R. F. Streater), without any of the weirdness that Dembski conjectures. (Dembski's discussion of quantum dynamics in any case is very confused; I can best rationalize it by supposing he thinks of quantum time evolution as something like a combination of classical diffusion and cadlag processes, with the cadlag jump-points representing moments of wave-function collapse. This would bad be pretty bad physics, but in any case the notion of "collapse of the wave function" is very dubious. More modern treatments of quantum mechanics seem to manage to eliminate it in favor of continuous processes of decoherence, as described in, e.g., D. Giulini et al., Decoherence and the Appearance of a Classical World in Quantum Theory, or at a popular level, David Lindley's Where Does the Weirdness Go?.)

Dembski's paper seriously mis-represents the nature and use of information
theory in a wide range of fields. What he puts forward as a new construction
is in fact a particular case of a far more general idea, which was published
forty-four years ago. That construction is extremely well-known and widely
used in a number of fields in which Dembski purports to be an expert, namely
information theory, hypothesis testing and the measurement of complexity. The
manuscript contains exactly no new mathematics. Such is the work of a man described on one of
his book jackets as "the Isaac Newton of information theory". His home page says this is the first in
a seven-part series on the "mathematical foundations of intelligent design"; I
can't wait. Or rather, I *can.*

*Update*, 29 March 2015: Replaced ugly images of mathematical equations with MathJax, added link to Rényi's paper which is now online.

Posted at August 10, 2004 16:45 | permanent link