Truth from Trash

<title>Chris Thornton, Truth from Trash</title>

<cite><a href="../">The Bactra Review: Occasional and eclectic book reviews by
Cosma Shalizi</a></cite> &nbsp; <strong>120</strong>

<h1>Truth from Trash</h1>

<h2>How Learning Makes Sense</h2>

<h2><em>by</em> <a href="../authors.html#chris-thornton">Chris
Thornton</a></h2>

Complex Adaptive Systems series

<br>MIT Press, 2000

<hr>

<h4>Two Cheers for Trash</h4>

We have here two books for the price of one.

<P>The first is an engaging and fairly non-technical survey of the field of
machine learning, somewhere between the level of <cite>New Scientist</cite> and
a friendly intro. textbook (which doesn't seem to exist for the field).  This
book is excellent and highly recommendable, if sometimes a bit too cute for its
own good.

<P>The other book claims to draws out the implications of machine learning, its
achievements and limitations, for a host of Deep Issues --- induction and the
reliability of knowledge, the nature of the scientific endeavor, questions of
representation and embodiment in cognitive science, the nature of pattern or
regularity, even the reality of theoretical entities.  It would be kind to
describe the thinking displayed in this book as ``sloppy''.

<P>Unfortunately, it is not easy to disentangle the two books.  I'll start with
the good one.

<P>The first big point Thornton makes is that machine learning is equivalent to
prediction.  That is, whenever we can specify some learning task sufficiently
precisely to get a machine to do it, it's equivalent to having the machine
learn to do predictions of a certain kind.  We always want the machine to learn
a rule of the form ``If X, then Y''; which is the same, from its perspective,
as learning to predict Y when it sees X.  In what is called ``concept
learning,'' for instance, the output Y is a binary variable, 1 if X is an
instance of a certain concept (e.g., ``edible fruit'') and 0 otherwise.
Learning the concept is equivalent to learning to predict 0 or 1, depending on
X.  And so on for more complicated classifications, regression, learning
behaviors, and so forth.  All this is quite sound.  I actually thought this
went without saying, but Thornton evidently felt a need to convince his fellow
computer scientists of it.

<P>The second big point is that, for a broad class of learning/prediction
problems, we have methods which work quite well, and are pretty simple at their
core.  The easiest is just what's called ``nearest neighbor'': given an unknown
input, predict the output of the most similar input whose value is known.  One
can elaborate such similarity-based methods considerably, but in essence they
all involve what Thornton calls ``fence-and-fill'': carve up the input space
into simple blocks (with ``fences''), and treat every point in the block the
same way (``fill'').  The refinements of fence-and-fill methods have to do with
how partitions are created and shifted in response to training corrections.  (A
particularly simple case of fence-and-fill learning is dividing the input space
by straight lines; if this is successful, the problem is said to be linearly
separable, and this special case seems to motivate a lot of Thornton's
thinking, as we'll see.)

<P>The third big point is that some problems are not very amenable to
fence-and-fill learning.  He calls these ``relational'' problems, which I think
is a bad name --- but let us leave that for discussing the <em>other</em>
book within these covers.  The prototype of these problems, to Thornton's mind,
is parity: given a number, guess whether it is even or odd.  The kicker here is
that guessing it's the same as its nearest neighbor is always wrong, and no
finite number of (contiguous) partitions will work.  If fence-and-fill
won't work, what will?

<P>Two traditional strategies that Thornton considers, but dismisses as not
really adequate, are what he calls ``supercharged fence-and-fill'' and
``pick-and-mix''.  The first uses various tricks to make fence-and-fill easier
to apply, in the hopes of getting some mileage out of it.  One can expand the
dimensionality of the input space (which in a sense makes it more likely
that it clusters), or increase the number and complexity of the fences
available.  On the other hand, pick-and-mix seeds the learner with some basic
relations which it may find helpful, and has it construct increasingly
elaborate combinations of these relations in the hopes of predicting
more accurately.  The problem with supercharged fence-and-fill is
that it doesn't really address the nature of the problem (or so Thornton
says); the problem with pick-and-mix is that it demands prior knowledge of
what is likely to work in this problem domain.  Thornton regards neither
of these as satisfactory.

<P>What he <em>does</em> regard as satisfactory is the ``truth from trash''
procedure of his title.  It works like this:
	<blockquote>
	<ol>
	<li> Use the chosen fence-and-fill method to process the current input
data, that is, to find regions of uniformly labeled stimuli.
	<li> Recode the current input data so as to draw identically labeled data
points closer together. ...
	<li> Assign the derived data to be the current data and apply the
procedure recursively until data are generated that can be satisfactorily
processed solely by the fence-and-fill method.  [p. 171]
	</ol>
	</blockquote>
	As Thornton notes, truth-from-trash does not have to assume that the
clusters it develops at each stage are meaningful in themselves, just that they
are going to be useful in deriving a yet-more-accurate cluster at the next
step.  Thornton doesn't prove that this kind of ``proto-representational
learning'' will do any better than supercharged fence-and-fill, much less
than appropriately chosen pick-and-mix, but I agree that it does have a better
<em>feel</em> to it that the former, and avoids the latter's need for expert
priming.  (Curiously, Thornton does not refer to any more formal publication on
truth-from-trash, either here or on his web-site.)

<P>One of the nicer aspects of truth-from-trash is that it shows how to
construct the kind of abstract, decoupled, ``symbolic'' representation beloved
of Good Old-Fashioned AI from the more dynamical, input-driven, connectionist
devices of ``nouvelle'' AI.  (<em>Pace</em> Thornton, truth-from-trash
<em>is</em> a symbol system within the meaning of the act, or at any rate of <a
href="simon-note.html">Herbert Simon's definition</a>.  So are connectionist
systems.)  This is all to the good, since (as emphasized by <a
href="../authors.html#daniel-dennett">Daniel Dennett</a>, <a
href="andy-clark-note.html">Andy Clark</a>, and others on the side of the
angels) cognition and the nervous system plainly employ both reactive,
embodied, nouvelle-ish devices and abstractive, symbol-manipulating,
disembodied devices, and no good can come of pretending that one or the other
doesn't exist, or that they can't work together. <a
href="log-rolling-note.html">*</a>

<P>Let me now turn to the bad book, which begins about where Thornton starts
talking about hard learning problems, ones fence-and-fill doesn't work very
well on.  As mentioned, he calls them ``relational,'' but it will become
clear why I think that's not a very good name for what he has in mind.  He
offers three characterizations of these problems, one of which is definitely
not equivalent to the other two, which in turn are not <em>obviously</em>
equivalent to each other, though they may be.  He does <em>not</em> use
``relation'' in the mathematical-logical sense of a set of ordered pairs (or
more generally <em>n-</em>tuples).  After all, what similarity-learning does is
learn a relationship between the input values and the output, in this sense.
So he needs to mean something more.

<P>The first characterization he offers of what he means is that a relational
problem is one where the absolute values of the inputs are uninformative as to
the output, but the presence or absence of some relation among the inputs is.
This can be characterized perfectly straight-forwardly in information-theoretic
terms (though he does not do so): a problem is perfectly relational in this
sense if the mutual information between the output variable and any one of the
input variables is zero (and the mutual information between the output and the
whole set of the input variables is maximal).  Moreover, this certainly
captures an obvious sense of ``relational''.  It's not obvious that this must
be much harder than what similarity-learning does, however.

<P>The second characterization is given by his measure of ``geometric
separability,'' defined as ``the proportion of data-points whose nearest
neighbors share the same output''.  This clearly means that problems with low
geometric separability are not amenable to similarity-methods, and perhaps not
to fence-and-fill at all (though I'd be much more cautious about that, absent a
proof, than he is).  But this is <em>not</em> equivalent to his first
characterization (a counter-example will be presented presently).

<P>Finally, he characterizes relational problems as those where the proper
partitions are not ``geometric or spatial'' (p. 119).  Absent a
characterization of what he means by ``geometric or spatial,'' this merely
indicates an impoverished view of geometry!  (I say this despite being an
almost entirely verbal thinker.)  Given his preoccupation with linear
separability and the like, I <em>guess</em> that what he has in mind is
something on the order of problems where the correct partition cannot be
represented by a finite number of simply-connected compact sets.

<P>Let me demonstrate that his characterizations are not equivalent with a
counter-example.  Consider the relationship <, ``less than,'' among integers
(positive and negative).  Our input space is now the x, y plane, and we color a
point red if x < y, and white otherwise.  Every point below the 45 degree
diagonal will be white, and every point on or above that line will be red. Now,
clearly just knowing the x coordinate of a point tells us absolutely nothing
about its color.  Thus, as we'd hope, learning the relationship < is a
relational problem in sense (1).  Unfortunately, in senses (2) and (3), it is
<em>not</em> relational.  Confined to a finite range of inputs, the geometric
separability is very high, since only points on either side of the line x = y
have neighbors who differ in value from themselves; in the limit that we allow
x and y to range over all integers, the geometric separability approaches 1.
And not only can the problem be tackled geometrically, it is even <em>linearly
separable.</em>

<P>I conclude that what Thornton <em>really</em> needs is either (2) or (3),
but that neither of these should be called ``relational''. (I do not have a
handy counter-example to the equivalence of senses (2) and (3), and for all I
know they may be equivalent.)

<P>There is also a numerical argument made, that in a non-relational problem
the number of partitions to be considered is finite, whereas in a relational
one it is infinite.  This is not so, but he seems to have been thinking of the
following point.  The number of binary partitions of a single variable, which
has N possible values, is the number of subsets of the input space, which is
2^N.  The number of possible relations among two variables, taking on N and M
possible values, is 2^(NM), since it is just the number of subsets of the
product space, which has cardinality NM.  For realistic values of N and M,
2^(NM) is much, much larger than 2^N (or 2^M for that matter).  But really in
the case that concerns Thornton, the number of values of the input vector as a
whole is what's relevant in the first case, and that's NM, so the number of
possible partitions is the same.

<P>Whatever you call these problems, they clearly exist and need to be solved.
Supercharging exploits the fact that many problems will <em>partially</em>
yield to fence-and-fill, and regards that as better than nothing.  Thornton is
rightly skeptical of this, and has an elaborate parable of a poker-playing
machine to make his point.  Pick-and-mix will work, but only if the primitive
relationships it's equipped with are adapted to the task.  Thornton seems to
doubt that we can arrange for such adaptation in general, as well as rejecting
it on the grounds that it doesn't <em>guarantee</em> learning will work.
Unfortunately, as he himself points out, one can show that no such guarantee is
possible.  For any problem where a given learning method does better than
blind search, there is another problem where it does worse, and one cannot
guarantee, in a completely <em>a priori</em> fashion, that a malevolent demon
(or, as the computer scientists suggestively put it, an Adversary) will not
arrange for us to always try our luck on problems ill-suited to our tools.  <a
href="adversary-note.html">This is what is called the ``No Free Lunch'' theorem
in machine learning</a>.  Its relevance to the actual world is, to say the
least, not immediately obvious, since we do <em>not</em> sample the space of
all possible learning problems uniformly, or even under the direction of an
Adversary.  It is thus, <em>pace</em> Thornton, not at all absurd to suggest
that we can learn what learning algorithms have above-chance performance <em>on
the kinds of problems we actually encounter.</em> In other words, the
background knowledge for pick-and-mix <em>is</em> acquirable.  (I learned this
counter to the no-free-lunch argument from <a
href="../authors.html#john-holland">John Holland</a>; so far as I know it's
original to him.)

<P>That the products of one level of learning or neural processing are the
raw materials of subsequent levels, which Thornton makes out to be something of
a breakthrough associated with truth-from-trash, is in fact a very old notion.
He actually quotes John Locke to this effect, without quite realizing, it
seems, how ancient and commonplace this notion is.  I will merely repeat one
of my favorite passages from William James, who puts it quite nicely:
	<blockquote>[T]he mind is at every stage a theater of simultaneous
possibilities.  Consciousness consists in the comparison of these with each
other, the selection of some, and the suppression of the rest by the
reinforcing and inhibiting agency of attention.  The highest and most
elaborated mental products are filtered from the data chosen by the faculty
next beneath, out of the mass offered by the faculty below that, which mass in
turn was sifted from a still larger amount of yet simpler material, and so on.
The mind, in short, works on the data it receives very much as a sculptor works
on his block of stone.  In a sense the statue stood there from eternity.  But
there were a thousand different ones beside it, and the sculptor alone is to
thank for having extricated this one from the rest.  Just so the world of each
of us, how so ever different our several views of it may be, all lay embedded
in the primordial chaos of sensations, which gave the mere <em>matter</em> to
the thought of all of us indifferently.  We may, if we like, by our reasonings
unwind things back to that black and jointless continuity of space and moving
clouds of swarming atoms which science calls the only real world.  But all the
while the world we feel and live in will be that which our ancestors and we, by
slowly cumulative strokes of choice, have extricated out of this, like
sculptors, by simply removing portions of the given stuff.  Other sculptors,
other statues from the same stone!  Other minds, other worlds from the same
monotonous and inexpressive chaos!  My world is but one in a million alike
embedded, alike real to those who may abstract them.  How different must be the
worlds in the consciousness of ant, cuttlefish, or crab!
[<cite>Principles,</cite> vol. I, ch. IX, p. 288]</blockquote>
	Thornton regards this by-now elementary fact as delivering a crushing blow
to the objectivity of perception and to scientific realism.  Of course it does
nothing of the kind; it is perfectly possible that the ``mental products''
<em>we</em> have constructed are, in fact, the <em>right</em> ones, the ones in
which the world is best described.  Iinformation-theoretically, that would mean
that our concepts are the ones in which prediction is both maximally accurate
and minimally complex.

<P>Thornton rightly connects the kind of change-of-representation which
truth-from-trash works with creativity, or at any rate with one aspect of it.
His speculations about the sources of aesthetic pleasure seem eminently
dismissable (art does not aim at accurate representation, and probably for most
of its history creativity as such has not been valued), and he doesn't seem to
be aware of a large and respectable literature on computational modeling of
creativity and change-of-representation, e.g., the <a
href="boden-note.html">excellent book of Margaret Boden</a> (who is, as it
happens, Thornton's colleague in the <a
href="http://www.cogs.susx.ac.uk/lab/adapt/">evolutionary and adaptive systems
group at the University of Sussex</a>!).  The quotations from Einstein about
the superiority of imagination to knowledge are more fit for dorm-room posters
than serious discussion.

<P>The trick of mushing the input space into a feature space where easy
learning methods work better is not original to truth-from-trash.  Thornton
surely knows this --- it's featured in tomes on pattern recognition since time
out of mind, i.e., the early 1970s at least, and is the core of the ``support
vector machine'' approach of <a href="../authors.html#v-n-vapnik">Vapnik</a> and co. --- but I think any naive reader
would come away thinking otherwise.  The main novelty here is the emphasis
on a recursive, data-driven recoding, whereas, e.g., the SVM crowd
tend to emphasize the use of fixed, hand-crafted kernels.

<P>Outside his immediate domain of machine learning, Thornton is not an
altogether reliable informant.  For instance (pp. 45--46), after correctly
giving the formula for the (Shannon) information contained in a random
variable, he tells us that this equation is ``identical --- save for the
leading minus --- to the <em>entropy</em> equation from statistical
thermodynamics.  Hence the name `negentropy' '' (his italics) for information.
Alas, as a glance at any text on statistical mechanics would show, we
physicists use the minus sign too, and in fact thermodynamic entropy
<em>is</em> the Shannon information of the atomic-level physical variables.
(This is actually a rather profound and important point in our discipline.)
--- The name ``negentropy,'' of course, arose in quite a different way, and is
in any case now obsolete, e.g., it does not appear in the index to <a
href="ct-note.html">Cover and Thomas's definitive book</a>.

<P>Or again, consider his description of Karl Popper's philosophy of science
(pp. 138--139; all italics Thornton's):
	<blockquote>Popper argues that since confirmation of inductively derived
theories and laws is out of the question, science should proceed by focusing on
<em>refutation.</em> A single counterexample is sufficient to falsify a theory
with absolute certainty.  Science can therefore build secure foundations by
accumulating <em>falsifications.</em> The methodological implications follow
automatically: respectable scientists should attempt to select new theories
according to their degree of falsifiability rather than according to their
degree of plausibility.
	<P>Unfortunately, although Popper's scheme has a certain zany appeal, it
is not entirely practical.  If we take his proposal literally, there is nothing
to prevent us --- as scientists --- from selecting a long sequence of incorrect
theories.  Provided each theory is falsifiable, we are proceeding in an
entirely proper, Popperian manner.  But the net effect is, of course, that no
real progress is ever made.</blockquote>
	Now, one can justly criticize Popper's philosophy of science on many
grounds, but claiming that it will not prevent us from making mistakes is
hardly one which can be taken seriously.  More importantly, this completely
misses Popper's point, which is that we should prefer theories which have
<em>survived</em> a long series of <em>serious</em> attempts to falsify them
--- theories which would not have survived those attempts, had they been false.
A scientist who merely succeeds in falsifying lots of stupid theories (e.g.,
``The Moon is made of green cheese''; ``The Moon is made of blue cheese'';
``The Moon is made of munster''; ``The Moon is mostly cheddar, except for the
Sea of Tranquility, where it is parmesan'') is <em>not</em> proceeding in a
good Popperian manner.

<P>The discussion of quantum mechanics is appalling.  Wave-particle
duality is a thoroughly obsolete notion, a needlessly misleading way of saying
that models of quantum phenomena have features reminiscent of models of
classical waves and of models of classical particles.  But they are
mathematically consistent beasts in their own right, and saying that this is
paradoxical, much less saying that they're both waves and particles as
observation demands, is a bit like saying that human beings exhibit
squirrel-ostrich duality --- we're squirrels in that we are mammals, and
ostriches in as much as we are flightless bipeds.  (The poet is right when she
calls us ``hybrids of plants and ghosts,'' but that's metaphor, not ontology.)
Nor does the uncertainty principle have <em>anything</em> to do with
measurements disturbing the measured; it doesn't have anything to do with
measurements at all!  The worst of it is, he doesn't <em>need</em> anything
about quantum mechanics at all!  Thornton is trying to make the point that good
scientific theories generally do not make sense of things using our everyday,
``lifeworld'' concepts, a point which can be made by using <em>any</em>
developed science.  Classical mechanics would do: those who have taught it, or
remember learning it, or know the history of its development, can all testify
that its notions are <em>not</em> intuitive ones.  The same is true, as I said,
of every science which has reached a creditable stage of development; employing
intuitive notions (as do, e.g., some branches of sociology and psychology) is a
<em>prima facie</em> sign that the discipline is not genuinely explanatory at
all.  But I would say that this is, if anything, reason to think that
scientific concepts <em>do</em> tell us something about reality, independent of
ourselves.  If we were just fooling ourselves, physics and biology would look
like psychotherapy, or literary criticism, or libertarian political theory,
which once upon a time --- before 1600, roughly, for physics --- they did.

<P>Let me finally come around to the Deepest Issue that Thornton deals with,
namely induction and the reliability of empirical knowledge (ch. 10).  The
quandary is this.  All our knowledge of the real world --- of ``matters of fact
and existence,'' in David Hume's words --- derives from empirical evidence
and induction.  But induction cannot be deductively justified <em>a
priori,</em> cannot be justified from an argument with an empirical premise (on
pain of regress), and cannot even be justified inductively, on the grounds that
it's generally worked in the past, since that is transparently a vicious
circle.  (Skeptical arguments against induction are ancient; Hume seems to have
been the first to give the vicious-circle argument.)  Therefore, induction
cannot be justified at all, and we have no real reason --- as opposed to mere
animal faith --- to think that any of our generalizations about the real world
will continue to hold.

<P>Bertrand Russell once said that a special section of Hell is reserved for
those who claim to have refuted Hume on this point.  Thornton is not in danger
of perdition (not on that count anyway), but a longish spell in Purgatory seems
fair.  He claims that the Humean predicament can be evaded by worrying about
data compression instead.  We know that we can compress data, and we can easily
tell whether one compression scheme works better than another on a given hunk
of data.  Every compression scheme more or less explicitly works by extracting
regularities, so where's the problem?  But this misses the point, which is that
we don't just want to compress <em>this</em> particular hunk of data; we want a
way of getting regularities worthy of the name, ones which will continue to
hold good for new data, and compression as such does not provide this.
Certainly schemes like <a
href="../stochastic-complexity-in-statistical-inquiry/">Rissanen's minimum
description length principle</a> do not.  One can prove that, <em>if</em>
certain regularities continue to hold, then a good compression scheme for our
training data will, very likely, remain a good way of compressing test data,
but that does not address the point.

<P>In the same connection, Thornton justifies Occam's Razor, not as a
heuristic but as a tautology, in a really extraordinary piece of reasoning:
	<blockquote>Simpler inductive inferences necessarily make fewer
assumptions about the data.  Every assumption has an even chance of being
invalid.  Thus simpler inductive inferences will, in general, be predictively
more successful.  If it has any effect at all, then, the application of
Occam's Razor within any sort of inductive process <em>must</em> improve
predictive accuracy. [p.162, his italics]</blockquote>
	This is absolutely astonishing, coming from somebody who certainly knows
about the existence and importance of the accuracy-complexity tradeoff in
modeling.  If what he says here were literally the case, then we would have no
reason to worry about overfitting...  (A simple counter-example from my own
field: When modeling stochastic processes, the simplest model, by a very
well-defined metric, is the one which always treats the process as a sequence
of independent, identically-distributed random variables, i.e. as biased
coin-tossing or dice-rolling.  This is admirably simple, but its accuracy is,
in general, dismal.)  So he <em>can't</em> mean this literally; but then what
<em>does</em> he mean?

<P>I'm only going to touch on the very strange idea that every assumption has
an even chance of being wrong, by saying that I don't think it makes any sense.
To quote C. S. Peirce, ``if universes were as plenty as blackberries, if we
could put a quantity of them in a bag, shake them well up, draw out a sample
and examine them,'' then we could give some kind of meaning to the chance of an
assumption being wrong.  (Even from a Bayesian, subjective-probability view,
there is no reason to suppose that all those probabilities are 50-50, and
indeed much, like the ``convergence to truth'' theorem, to doubt it.)  This
assumes that it's possible to individuate assumptions --- to find some sensible
way of counting how many of them an inductive method makes.

<P>Let me say that I don't think Thornton is a dumb man or even a bad
researcher.  On the contrary, I came to this book with high expectations
because his professional papers show he is a <em>good</em> researcher.  This
book does not do him justice.  He has attempted to combine popular exposition,
a series of new technical proposals, and philosophical reflections in a single
short volume, and, as might've been expected, all his projects have suffered.
I have no doubt that he is aware of the bulk of the related research I have
faulted him for not mentioning, and only very little doubt that he would have
cited it had he not also been aiming for a popular audience.  His carelessness
--- and, as the case of negentropy shows, confabulations --- with material from
other disciplines is much more troubling, but again, if he were writing for a
more hard-bitten (not to say embittered) audience of researchers, I think that
would be better controlled.  On the other hand, he certainly <em>can</em> write
popular science, though he is a bit too fond of using cute stories to make his
points.  What he cannot do (here I perform induction on a single instance) is
succeed at writing for both audiences at once.  I will certainly continue to
follow his technical papers with every expectation of profiting from them, and
if he writes purely popular books I'll happily read them.  (<em>Warning:</em>
an awful and unfair but inevitable pun follows.)  As for this hybrid, I'm
afraid that he has done too little to separate the truth from the trash.

<hr><em>Disclaimer:</em> I asked for, and got, a review copy of this book from
the publisher.  Needless to say, I have no stake in its success.

<hr><em>Errata:</em> p. 12, for ``commodities'' read ``securities''; p. 12, for
``wheat stocks,'' read ``wheat futures''; p. 35, Kepler had to make the shells
of his Platonic-solids model of the universe thick to accommodate, not the
elliptical nature of the orbits (which he hadn't recognized yet), but the the
epicycles; p. 146, ``Max Planck discovered that excited electrons emit and
absorb energy only in specific amounts, which he termed <em>quanta</em>'' (his
italics): electrons, excited or otherwise, played no role in Planck's
derivation of the black-body law and the quantization of the electromagnetic
field (the black body spectrum, in fact, contains photons of all energies);
p. 150, ``the best code for a data set is one in which the lengths of codes are
inversely <em>proportional</em> to the frequency of the corresponding encoded
sequences'' (his italics): actually they are proportional to the logarithm of
the inverse of the frequencies (see, e.g., Cover and Thomas, chapter 5).  In
all URLs printed in the text, the tilde character <tt>~</tt> is rendered as a
tilde accent over the next alphabetic character, as would result from the TeX
error of writing <tt>\~x</tt> instead of <tt>$\sim$x</tt>.

<hr>xii + 204 pp., index of names and subjects (sparse), end-notes for
each chapter, bibliography, numerous tables and graphs, a few black and white
photos (or rather, quite poorly digitized reproductions of photos)

<br><a href="../subjects/artificial-life.html">Artificial Life and Agents</a> /
	<a href="../subjects/cognitive-science.html">Cognitive Science</a> /
	<a href="../subjects/computers.html">Computers and Computing</a> /
	<a href="../subjects/phil-of-sci.html">Philosophy of Science</a> /
	<a href="../subjects/popular-science.html">Popular Science</a> /
	<a href="../subjects/probability.html">Probability and Statistics</a>

<br>Currently in print as a hardback, US$29.95, ISBN 0-262-20127-5, LoC Q325.4 T47.

<hr>19--20 June 2000

<br>Thanks to Bill Tozier and Danny Yee for advice.