August 19, 2005

Our New Filtering Techniques Are Unstoppable!

Attention conservation notice: 2800 words and many large figures advertising our latest scientific paper. More than most people will ever want to know about nonlinear filtering, cellular automata and coherent structures.

A year and a day after we began working on the manuscript, here it is:

Cosma Rohilla Shalizi, Robert Haslinger, Jean-Baptiste Rouquier, Kristina Lisa Klinkner and Cristopher Moore, "Automatic Filters for the Detection of Coherent Structure in Spatiotemporal Systems", Physical Review E 73 (2006): 036104 = nlin.CG/0508001 [Smaller, higher-quality PDF]
Abstract: Most current methods for identifying coherent structures in spatially-extended systems rely on prior information about the form which those structures take. Here we present two new approaches to automatically filter the changing configurations of spatial dynamical systems and extract coherent structures. One, local sensitivity filtering, is a modification of the local Lyapunov exponent approach suitable to cellular automata and other discrete spatial systems. The other, local statistical complexity filtering, calculates the amount of information needed for optimal prediction of the system's behavior in the vicinity of a given point. By examining the changing spatiotemporal distributions of these quantities, we can find the coherent structures in a variety of pattern-forming cellular automata, without needing to guess or postulate the form of that structure. We apply both filters to elementary and cyclical cellular automata (ECA and CCA) and find that they readily identify particles, domains and other more complicated structures. We compare the results from ECA with earlier ones based upon the theory of formal languages, and the results from CCA with a more traditional approach based on an order parameter and free energy. While sensitivity and statistical complexity are equally adept at uncovering structure, they are based on different system properties (dynamical and probabilistic, respectively), and provide complementary information.

Rob and Kristina worked out the fundamental theory and algorithms for this paper and its predecessor; Rob also figured out the order parameter for cyclic cellular automata, and Kristina did the actual statistical analysis. Jean-Baptiste implemented all our ideas (in Objective CAML), and ran all the simulations. Cris came up with the idea of local sensitivity, and pushed for harder examples. I pushed for local statistical complexity, and a lot of misconceptions.

OK, assuming anyone's still reading, let me give you an illustration of the kind of thing we're talking about in the abstract.

That is a picture of the time-evolution of a one-dimensional cellular automaton ("ECA rule 54", in the jargon), starting from a random initial condition. Space goes up and down, and time advances from left to right. What you can see is that, most of the system is soon dominated by patches of a single repeating regular pattern, called a "domain". Technically, each domain is defined by a "regular language" (a certain kind of rule describing the pattern), which can extend indefinitely across the lattice, and persist indefinitely in time under the action of the cellular automata rule. ("The regular language is invariant under the time evolution".) There are also things moving through the domains ("particles"), which are another kind of structure. All this is, in this case, reasonably easy to make out by eye. You'll also notice, if you look long enough, that every once in a while the domains are disrupted seemingly out of nowhere. Since the rule is deterministic, there has to be a reason, and it turns out, if you look quite carefully, that there are multiple phases the domain could be in, that the boundaries between regions of different phase act like particles, and you're seeing the collision of those phase defects.

As I've discussed before, understanding the particles and domains of such systems is important in understanding their dynamics, and still more important in grasping their computational properties — particles and their collisions are the components people use to build computational circuits in cellular automata, and appear spontaneously in CA evolved to do non-trivial computations. Accordingly, there's a fair amount of theory now for the regular-language patterns of one-dimensional deterministic cellular automata with known rules. (Important early contributions were made by, inter alia, Wolfram, Grassberger, and Boccara; the most general theory I know of was developed by Hanson and Crutchfield. Ilachinski's textbook on CA has a pretty good review, but it's still a live subject, witness Pivato's recent work on particle kinematics.) But "general theory" here means a general framework, where all the details still have to be filled in by hand, case-by-case, after intensive communion with the pictures like that figure, and with mathematical objects like the regular-language evolution operator induced by the CA. In the end, you can build a little filter which will scan over configurations produced by the system and identify the domains and particles. (Here is a figure showing the domain and the filter, from an old paper I wrote with Wim Hordijk and Jim Crutchfield.)

And then you turn to another system, like this one (ECA rule 110), and you have to do everything over again, because all the work you've done is completely dependent on that particular system. Use your old filter on the new data, and you get nonsense.

Now, unlike my co-authors, I am lazy, which means I don't like putting that much work into figuring out the coherent structures in one system, let alone many. This is the kind of thing I want a computer to do for me, with as little input or insight on my part as possible. (As Larry Wall has said about Perl programming, this sort of laziness implies a negative discount rate: you do work now so your future self won't have to.) More seriously, the primate visual cortex is a remarkable thing, and does a marvelous job of analyzing the kinds of patterns needed to get East African Plains Apes through their natural life-cycles, but it was never supposed to cope with massive collections of high-dimensional multi-variate data, which is what science increasingly is faced with. (Talk to, say, these people if you don't believe me.) Something more automatic and principled is deeply to be desired. We tried to find a generic way of locating the places where interesting, important things were happening, on the grounds that the most interesting and important things in the system are the coherent structures. In fact, we found two ways of doing this, which turn out to be quite distinct. (I was sure, before we actually had any results, that they'd turn out to be two ways of getting at the same aspects of the system, which shows it's a good thing wiser heads were involved.)

The first quantity, which we ended up calling "local sensitivity", tries to quantify interest and importance in the sense of "having a lot of influence on the rest of the system" and "small changes here make a big difference". In classical dynamics, you quantify things like this with the Lyapunov exponents, but for a number of reasons, explored in the paper, we ended up needing something different. Basically, we perturb a small-ish region in the vicinity of a given point, and then see how large an area is affected by the perturbation over a certain interval of time; the bigger that area is, compared to how large it possibly could be, the larger the sensitivity at the point in question. Areas of high sensitivity are ones where small perturbations influence the future evolution of large parts of the system; they tend to drive their neighbors, rather than be driven by them.

The other quantity is the "local statistical complexity", in essence the number of bits of information about the past of a given point needed to optimally predict its future behavior. You might worry that this is not an objectively well-defined quantity, but we'd earlier shown how to put those fears to rest: we showed how to reconstruct the effect state space at each point ("causal state reconstruction"), and then use some information theory and the idea of a minimal sufficient statistic to show this gives an objective forecasting complexity. The details are too technical to go into here (though the connection between physical complexity and statistical inference is pleasing), so if you're really interested I'll refer you to the paper. Complex regions, in this sense, are ones where a great deal of information about the past is required for optimal prediction — where a lot of the past is relevant to the future, and fine distinctions have to be drawn between similar histories.

In practical terms, what we did was take the original CA configurations, and then compute the values of these two fields — local sensitivity and local statistical complexity — at each point in space, at each moment in time, over and over again, and then compare those results to the original field. Here, for example, is what we get looking at the first example (ECA 54) — in order, the original system (repeated here for comparison), the system as filtered for sensitivity, and as filtered for complexity. (The darker points are more sensitive or more complex, respectively.)

Looking at this, all the little defects just pop right out, even though the filters don't know anything about the phase structure of the background domain, or even that there is a background domain. Now, when we apply exactly the same filters to the second example (ECA 110), this is what we get:

In other words, we see the particles cleanly separated from their regular domain backgrounds, and the particle collisions/interactions as well; they're as complex or even more complex than the particles. Since the interactions are what you use to build a universal computer in this CA rule, that's pleasing, but secondary.

Which is all very well, but there are remarkably few one-dimensional pattern-forming systems in nature, and the regular-language story gets weird and unsatisfying in two or more dimensions. Do our filters work in more than one dimension? Well, it depends what you mean by work. The defining math has no problem in higher dimensions, but how do we know that the structures they find are the right, important ones? We needed a two-dimensional dynamical system where people had already figured out what the important structures were. For a number of reasons, we chose cyclic cellular automata (CCA) — partly because they self-organize into cute spiral waves, partly because some clever mathematicians had already thoroughly studied the system, and partly because some of us had already written papers about them and felt comfortable with them.

You really need a movie to appreciate what CCA do, but my skills don't extend to creating one, so I'll just recommend that you download Mcell and see for yourself. Here, in lieu of that, are a few snap-shots. You can see the random-noise initial conditions,

the first growth of spirals,

and their eventual domination of the whole lattice.

If you're a statistical physicist, you're indoctrinated into a particular way of thinking about coherent structures. You think of them as topological defects in the order parameter field, places where the free energy associated with the order parameter gets very high; and the order parameter is the variable which measures the symmetry broken by the background pattern. This is part of an elaborate, immensely-successful circle of ideas connecting broken symmetry, order parameters, phase transitions, universality, thermodynamic potentials, generalized elasticity and hydrodynamics — what have been called the "basic notions of condensed matter physics". But the application of these ideas involves an immense amount of trial and error, too, and tradition. When you're confronted with a new system, your first question is "which system does it remind me of?", hoping that someone else has already identified the right order parameter and effective free energy for that system. You then try to make them work for yours, find that they don't, and start tinkering with them until, if you are very good, things start to click. Making everything work out of equilibrium is a bear, and even if the next system you encounter looks similar, there's no guarantee that the same order parameters will in fact work there. (While we were writing this, we ran across this paper on a system with a similar phenomenology to our CCA. We tried their order parameter on our case, and the results were horrible.)

Nobody had worked out the order parameter for CCA before, but we wanted to compare the results of our filters to more physical ideas, so we needed to come up with one. This was basically the work of Rob Haslinger, who (unlike me) is actually very good at real physics. After remarkably few tries, he came up with something which fits the end-state spiral configurations very well (a kind of discretized XY model, if you're into this kind of thing). Here is a typical image of one of those long-run configurations, on the left, and the free energy field corresponding to the order parameter on the right.

Now here is what we get from the sensitivity and complexity filters. Let me stress, again, that these are the same filters we applied to the 1D case — we really, honest, cross our hearts didn't cheat, tweak or adapt at all.

Clearly, both our filters pick out the right structures — though in somewhat different orders. It turns out that you need a lot of information to predict what's happening at the boundaries between spirals — essentially because you've got out-of-phase regions right next to each other, and so you need to think hard to see what things at the boundaries will do — but they're not very sensitive. Make a small perturbation on a boundary, and it's going to get erased quickly by the waves radiating out from the spiral cores. The story with the cores is just the other way around: you need less information to predict them (though more than background), but disrupting one screws with the whole spiral, making them highly sensitive and autonomous.

I mentioned that the two filters give complementary information; they look at different kinds of system properties, as we say in the abstract, and their values are basically uncorrelated. (You can find correlation coefficients in the paper.) Like I said, I'd expected there would be a strong correlation at least — that places where optimal prediction needs a lot of information would be places where small perturbations have a large effect, and vice versa. My co-authors were not convinced, and, as it happens, right. Remarkably enough, however, if you look at the pictures, they're manifestly identifying the right structures, but in different ways, like I just explained in the spiral case. (We go into the contrast between the two filters in some depth in the paper.)

The fact that the order parameter field looks almost the same as the complexity field is, in my humble opinion, especially noteworthy. Let's see them again.

The first is the product of Rob's insight, and the accumulated wisdom of the statistical-physical community; it is also a function of the current lattice configuration alone. The second is an entirely automatic product of our code, and the definition of local statistical complexity, and depends on the past time-evolution of each point's local neighborhood. It is not at all obvious that there should be any relationship between them at all. I admit that they're not exactly the same — they're only strongly correlated, not identical — but the estimates of statistical complexity are subject to finite sample noise, and I'd be willing to bet the difference is due to that. It would be grossly irresponsible at this point to claim that reconstructing the causal states gives us a way of automatically finding good order parameters, but the identity between those figures suggests something like that might be worth looking at.

Also worth looking into, even more, is actual data. Because the sensitivity filter depends on making perturbations, and lots of them, it's probably not suitable for many experimental situations. But the local complexity filter just needs observations, so in principle any experiments producing sufficiently fast movies could work. (Time resolution is important because we need to ensure that a change at one point in one frame can only affect a finite region in the next frame, or else our techniques break down. We'd also need to somehow discretize space and color, but digital cameras do that to you anyway.) For it to be really meaningful, you'd need either an effectively two-dimensional system, or three-dimensional imaging data; we've got some ideas on both fronts, but are open to suggestions. We also have some ideas about weakening the requirement of having a movie, so we could work with just a representative ensemble of snapshots, but they're not ready yet for public consumption.

8 March 2006: The paper is now printed, with very minor corrections; I've updated the citation above, but here it is again: Physical Review E 73 (2006): 036104.

20 December 2012: Further to the theme.

Manual trackback: The Quantum Pontiff; Brad DeLong; The Statistical Mechanic

Complexity; Enigmas of Chance; Physics; Self-Centered

Posted at August 19, 2005 14:26 | permanent link

Three-Toed Sloth