Cosma Shalizi's Thesis

Causal Architecture, Complexity, and Self-Organization in Time Series and Cellular Automata

Abstract: All self-respecting nonlinear scientists know self-organization when they see it: except when we disagree. For this reason, if no other, it is important to put some mathematical spine into our floppy intuitive notion of self-organization. Only a few measures of self-organization have been proposed; none can be adopted in good intellectual conscience.
To find a decent formalization of self-organization, we need to pin down what we mean by organization. The best answer is that the organization of a process is its causal architecture --- its internal, possibly hidden, causal states and their interconnections. Computational mechanics is a method for inferring causal architecture --- represented by a mathematical object called the \epsilon-machine --- from observed behavior. The \epsilon-machine captures all patterns in the process which have any predictive power, so computational mechanics is also a method for pattern discovery. In this work, I develop computational mechanics for four increasingly sophisticated types of process --- memoryless transducers, time series, transducers with memory, and cellular automata. In each case I prove the optimality and uniqueness of the \epsilon-machine's representation of the causal architecture, and give reliable algorithms for pattern discovery.
The \epsilon-machine is the organization of the process, or at least of the part of it which is relevant to our measurements. It leads to a natural measure of the statistical complexity of processes, namely the amount of information needed to specify the state of the \epsilon-machine. Self-organization is a self-generated increase in statistical complexity. This fulfills various hunches which have been advanced in the literature, seems to accord with people's intuitions, and is both mathematically precise and operational.

Introduction
Measuring Pattern, Complexity and Organization
Memoryless Transducers
Time Series
A Reconstruction Algorithm
Connections
Transducers
A Very Brief Introduction to Cellular Automata
Domains and Particles: Spatial Computational Mechanics
Spatio-temporal Computational Mechanics
Conclusion

It's 182 pages, including table of contents, references, figures, etc.

Postscript (3.4M; gzipped, 819k)
PDF (1.4M; gzipped, 1.1M)

Updates

If you're interested in the material in Chapter 5 ("A Reconstruction Algorithm"), there is a clearer presentation of a much-improved algorithm in my paper with Kristina Lisa Klinkner, "Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences", in Max Chickering and Joseph Halpern (eds.), Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI 2004), pp. 504--511 (Arlington, VA: AUAI Press, 2004), cs.LG/0406011. There's C++ source code available.

If you're interested in Chapter 10 ("Spatio-temporal Computational Mechanics"), my paper "Optimal Nonlinear Prediction of Random Fields on Networks" (Discrete Mathematics and Theoretical Computer Science AB (2003): 11--30, math.PR/0305160) extends the main results to stochastic fields on arbitrary (undirected) graphs. (The theory in that chapter forms a special case.)

If you're interested in formally defining and quantifying self-organization, especially the material in chapter 11, see my paper with Kristina Lisa Klinkner and Robert Haslinger, "Quantifying Self-Organization with Optimal Predictors", Physical Review Letters 93 (2004): 118701, nlin.AO/0409024. That paper includes a very compressed presentation of the key parts of the theory in Chapter 10, as well as experimental results on cyclic cellular automata.

The chapter 10 material is also extended in my paper with Robert Haslinger, Jean-Baptiste Rouquier, Kristina Lisa Klinkner and Cristopher Moore, "Automatic Filters for the Detection of Coherent Structure in Spatiotemporal Systems", Physical Review E 73 (2006): 036104, nlin.CG/0508001. There we report experimental results on a lot of the elementary cellular automata, as well as on cyclic CA. There's Objective CAML source-code available.

Finally, I also have papers not related to my thesis.