The Bactra Review: Occasional and eclectic book reviews by Cosma Shalizi   91

An Introduction to Natural Computation

by Dana H. Ballard

Complex Adaptive Systems series
MIT Press, 1997

Not Natural Enough

Living organisms are very good at certain kinds of information-processing. A long-standing goal of computer science is to figure out how they do it, so that we can make machines which do likewise. To this end, the computer scientists have developed a largish number of techniques of computing which are inspired by information-processors found in nature; these are sometimes, as here, called ``natural computation,'' though the resemblance to what actually happens in vivo is so sketchy that naturalistic might be a better name. This book is, as its title tells us, intended to serve as an introductory survey of these techniques, aimed at undergraduate and beginning graduate students, primarily in computer science but also in the natural sciences.

After an introductory overview, there are five chapters devoted to ``core concepts''; I'll say more about these presently. After the core, we get three chapters on how neural networks can retrieve and associate memories, and how to put the memories in them; three chapters on Markov chains, hidden Markov models and Markov decision processes; two chapters on evolutionary computation; and a summary chapter. The substantive (peripheral?) material is treated in a competent, uninspired way, emphasizing workable algorithms over the justification or analysis of those algorithms. There are better introductory-level treatments of each of Ballard's topics, and he makes no real attempt to bring out inter-connections between them, but it is good to have them all within one set of not-too-widely-spaced covers, especially for survey courses, and to that extent Ballard has done a useful service.

The initial chapters on core concepts are another story. These cover, successively, ``fitness'' (really Jorma Rissanen's minimum description length principle), ``programs'' (really search algorithms), ``data'' (really linear algebra), dynamics and optimization. Together it amounts to about a third of the book. Some of this --- not enough --- is a slightly tedious review of math which will be needed later. (Perhaps it will not be review for computer science students?) Much more of it consists of extremely dubious statements about dynamics, biology and information theory.

Ballard claims that the fitness of any biological information-processor is given by the description-length it assigns to data; given two critters, the one which can store (say) retinal images using fewer bits is ipso facto fitter. This is because, he says, the ``minimum-description-length principle measures both execution speed and repertoire'' (p. 27; the claim is repeated in many other places). But this just isn't so. Knowing that a model minimizes description-length tells us nothing about the size of its repertoire (since it is calculated for a single instance of data), certainly nothing about speed. On Rissanen's own showing, it also says nothing about accuracy, about how often the model will be wrong and by how much. This is not just part of the running squabble between the two interpretations of probability, though it is that too. (Ballard assumes that probability theory is Bayesianism.) In biological studies of cognition and behavior, accuracy and related concepts are absolutely essential; the assumption is that the animal needs to represent the environment correctly and use that information to make decisions which are good. These theories --- say, signal detection theory in psychophysics, or foraging theory in ecology --- are extremely successful and well-confirmed empirically; we know that animals make such computations. Ballard does not mention them at all, and ``accuracy,'' ``reliability,'' ``error'' and related words do not even appear in his index. Of course, representing and processing environmental information requires scarce resources, so to that extent economy is biologically favored, but this must always be traded off against the benefits of increased accuracy; we cannot simply assume that minimality is everything. [Semi-technical footnote about description-length.]

The best one can say about all this is that Ballard does not actually use MDL arguments in any of the substantive chapters. Occasionally he waves his hands about it, and makes totally unsupported claims that connect to it (e.g. that natural selection will tend to economize on genetic information --- has he never heard of introns?). Judicious readers may hold their nose and press on, but I worry about the effect this will have on impressionable students who know no better.

The chapter on dynamics is almost as vexing. Ballard claims that, because of Newton's laws, all dynamical systems must be smooth; one might as well argue, on the basis of quantum mechanics, that all dynamical systems must be linear. Even at the microscopic level of statistical mechanics, where Newton's laws are (pretty closely) the relevant equations of motion, smoothness is not guaranteed; when we deal with the highly aggregated variables relevant to natural computation, all bets are off. (Reading Arthur Winfree might be a suitable corrective.) Despite this, Ballard manages to get in only one error when discussing linear dynamics (see errata for p. 101 below). When it comes to nonlinear systems, however, he blunders. Indeed, the very first things he says about them ---

Linear systems can be analyzed because the behavior over the entire state space can be determined. But nonlinear systems are more complicated, and their behavior can be analyzed only in the neighborhood of equilibrium points. [p. 104]
--- is wrong, so badly wrong that any competent first text on non-linear dynamics would suffice to dispel it. He goes on to mangle the meaning of ``attractor'' (confusing it with ``stable fixed point'') and of ``chaos'' (confusing it with exponential divergence). This is a very painful way of establishing a few mathematical tools he will use in the chapters on neural nets.

Ballard describes some elementary brain anatomy, and a few basic facts about the shape of neurons and their speeds. This is the extent of his connection to biology. There is no mention of the immune system, which in many animals is arguably a far more sophisticated computer than the brain, and precious little of the endocrine system and other purely chemical organs of integration and control. He does not mention any empirical study on any aspect of animal behavior or cognition; at the very least there should have been pointers to the literature.

I wanted to like this book a lot, but I ended up fairly disgruntled. Probably there is nothing better presently available for an introductory course on naturalistic computation, but the instructor would need to argue with the text a lot. I'd not recommend it for self-study, unless one is already familiar with some of the ideas of naturalistic computation and wants to learn something of new techniques; for me its main value is, in fact, as a reference-compilation of Neat Tricks. Caveat lector.


Disclaimer: I asked for, and got, a review copy of this book from the MIT Press. I have no stake, however, in its success.
Errata: p. 16, implied definition of chaos is wrong (since it includes exponential growth); p. 16, ``but these [chaotic] systems cannot be easily utilized in computation'': see S. Sinha and W. L. Ditto, ``Dynamics Based Computation,'' Physical Review Letters 81 (1998): 2156--2159, and the references for chaos-based computing therein, going back to the 1980s; p. 39, ``the forgoing discussion assumes that the channel is being used to send one symbol at a time'': so does the subsequent discussion, the real distinction being between efficient and inefficient coding; p. 48, ``since biological programs must work online, this data compression translates into a minimum entropy principle. For that reason, living organisms are sometimes colloquially referred to as `entropy eaters' '': actually, the MDL principle includes the maximum entropy principle as a special case (Rissanen, SCiSI, pp. 97--100), and entropy-eating has to do with metabolism in any case; p. 50, ``The minimum-description-length principle was introduced by Jorma Rissanen in Stochastic Complexity in Statistical Inquiry'': rather, his paper (cited in SCiSI) ``Modeling by shortest data description,'' Automatica 14 (1978): 465--471; 58, ``minimax is used in two-person games'': no, or rather, not just, as the text implies; p. 58, ``it is always right to discretize any arbitrary description'': discretization can lead to profound differences in even qualitative behavior, not just quantitative details, as is well-known for shift maps, and an essential part of the Blum-Shub-Smale results about super-Turing computation; p. 65, definition of a minimax strategy is actually a definition of a procedure which can approximate minimax, with a little luck; p. 101, for ``the sum of n first-order LDEs'', read ``a system of n coupled first-order LDEs''; p. 113, identifies objective functions with ``entropy measures,'' though none of his subsequent objective functions is an entropy measure; p. 116, the subscript notation for partial derivatives is used, but not explained until p. 123; pp. 137--138: ignores the extensive cortical connections going ``downward'' in the hierarchy; after that I gave up noting them.
xxiv + 307 pp., 152 black and white diagrams, 25 tables, bibliographic notes at the end of each chapter, subject index (spotty)
Cognitive Science / Computers and Computing / Neuroscience
Currently in print as a hardback, US$50, ISBN 0-262-02420-9, and as a paperback, US$32.50, ISBN 0-262-52258-6, LoC QP356 B345
28--29 July 1999