I feel like an accident

I work on methods for building predictive models from data generated by stochastic processes, and applying those models to questions about neural information processing, self-organization in cellular automata, and so forth. All of this is about using tools from probability, statistics, and machine learning to understand large, complex, nonlinear dynamical systems. This is why my dissertation was in statistical physics, but I now teach in a statistics department.

My departmental homepage has a fuller explanation of what I do, and how I came to do it, in terms which (I hope) make sense to statisticians.

Curriculum Vitae/Resume: Lists my publications, talks given, and other professional data.

My doctoral dissertation, Causal Architecture, Complexity and Self-Organization in Time Series and Cellular Automata (2001). A unified presentation of the material from my previous papers on computational mechanics, plus several chapters of until-then unpublished work. Includes quantitative definitions of "emergence" and "self-organization", of which I am fairly fond.

[1] Jim Crutchfield & CRS,
"Thermodynamic Depth of Causal States: Objective Complexity via Minimal
Representation", Physical Review E **59** (1999):
275--283 (PDF), cond-mat/9808147. It
explains why thermodynamic depth, a notion advanced by the late, great Heinz
Pagels, while nifty in concept and certainly not just Yet Another Complexity
Measure, doesn't work very well as originally proposed, and what needs to be
changed to make it work, namely adding causal states. (In the words of the
poet, "we subsume what we do not obliterate.")

[2] Cris Moore, Mats Nordahl, Nelson Minar & CRS,
"Entropic Coulomb Forces in Ising and Potts Antiferromagnets and Ice
Models," cond-mat/9902200;
final published version, "Vortex Dynamics and Entropic Forces in
Antiferromagnets and Antiferromagnetic Potts Models," Physical Review
E **60** (1999): 5344--5351. (To be honest, I didn't write
any of this, but did a large chunk of the simulation work.) Only of interest
to people who care about pretty abstract models in statistical mechanics.

[3] Jim Crutchfield, Dave Feldman
& CRS, "Comment on `Simple Measure for Complexity,' " chao-dyn/9907001 =
Physical Review E **62** (2000): 2996--2997. Short,
critical remarks on Yet Another Complexity Measure.

[4] CRS & Jim Crutchfield, "Computational
Mechanics: Pattern and Prediction, Structure and Simplicity," Journal of
Statistical Physics **104** (2001): 816--879 (PDF), cond-mat/9907176. I'll let
Mathematical Reviews describe this one for me:

The main concern of this article, written in a very general setting, is how to maximally compress the past of a random process still keeping all the useful information, so as to predict its future as well as if all the past were remembered. In this connection the authors introduce "causal states" into which all the space of possible past histories is cut. Within each causal state all the individual histories have one and the same conditional distribution for future observables. An epsilon-machine is defined which uses the causal states for prediction. The authors prove several theorems about them, whose intuitive meaning is natural: that causal states are maximally prescient, that they have the minimal statistical complexity among all prescient rivals, that they are unique, that epsilon-machines have the minimal entropy among all the prescient rivals and some inequalities. This voluminous article, containing 140 references, may also be used as a survey in the area of abstract theory of computational complexity of prediction.

[5] CRS & Bill
Tozier, "A Simple Model of the Evolution of Simple Models of
Evolution," adap-org/9910002; accepted by
JWAS;
rejected by Theoretical Population Biology for lack of decorum. A
not *entirely* unserious critique of recent attempts to model evolution
by physicists who don't know biology.

[6] CRS & Jim Crutchfield, "Pattern Discovery and Computational Mechanics," cs.LG/0001027. Why people who are interested in machine learning should care about what we do. Amicably rejected by the Proceedings of the 17th International Conference on Machine Learning, with remarks on the order of "interesting, but you really need to say more about how to code it up," which was fair enough, and provided some of the impetus for "An Algorithm for Pattern Discovery" and "Blind Construction" (below).

[7] CRS & Jim Crutchfield, "Information Bottlenecks, Causal
States, and Statistical Relevance Bases: How to Represent Relevant Information
in Memoryless Transduction," nlin.AO/0006025. Discussion of
several related ways of extracting the information one variable contains about
another, and using it to model the functional relationship or transducer
connecting the two. Advances in Complex Systems
**5** (2002): 91--95.

[8] Wim Hordijk, CRS &
Jim Crutchfield, "Upper Bound on the Products of Particle Interactions in
Cellular
Automata", Physica D **154** (2001): 240--258, nlin.CG/0008038. A proof of a
limit on how complicated the interactions between propagating emergent
structures in (one-dimensional) CAs can get. Wim is too modest to call it
Hordijk's Rule, so I will.

[9] CRS and Dave
Albers, "Symbolic Dynamics for Discrete Adaptive
Games," cond-mat/0207407,
SFI Working Paper 02-07-031. Why the hyper-stylized game-theory models which
have taken over econophysics in the last few years are *not* complex,
dynamically or statistically. (We were going to call it "no chaos and little
complexity in the minority game," but settled on something more neutral.)
Which isn't to say they're not worth studying; just that they need to justify
themselves by what they can tell us about real(er) systems. Submitted to
Physics Letters A. (Update, 2003: Revised to placate an unusually
appalling referee.)

[10] CRS, Kristina L. Klinkner and Jim Crutchfield, "An Algorithm for Pattern Discovery in Time Series," cs.LG/0210025. (This version supersedes the SFI Working Paper one.) A statistically reliable, linear-time algorithm for inferring causal states from data. The code and documentation are available, released under the Gnu Public License. I'd recommend reading the "Blind Construction" paper first, since I think that has a clearer presentation of the algorithm and its motivation.

[11] CRS and Cris Moore, "What Is a Macrostate? Subjective Measurements and Objective Dynamics," cond-mat/0303625; also PITT-PHIL-SCI-1119 at the Phil-Sci Archive. Why thermodynamic macrostates are neither completely objective nor (as some argue) completely epistemic, but are instead causal states, in the sense of computational mechanics. Submitted to Studies in the History and Philosophy of Modern Physics.

[12] CRS and Kristina L. Klinkner, "Quantifying Self-Organization in Cyclic Cellular Automata," pp. 108--117 in Lutz Schimansky-Geier, Derek Abbott, Alexander Neiman and Christian Van den Broeck (eds.), Noise in Complex Systems and Stochastic Dynamics (Bellingham, Washington: SPIE, 2003), part of the proceedings of Fluctuations and Noise 2003. A preliminary report on the work that became "Quantifying Self-Organization with Optimal Predictors", below, which however has more details on the algorithm and related literature, because we were less space-constrained. nlin.AO/0507067.

[13] "Optimal Nonlinear Prediction of Random Fields on Networks," for the
conference Discrete Models for
Complex Systems 2003, printed
in Discrete Mathematics and Theoretical
Computer Science, **AB(DMCS)** (2003): 11--30.
Available online from either
the journal or
arxiv.org
(math.PR/0305160).

[14] "Methods and Techniques of Complex Systems Science: An Overview",
chapter 1 (pp. 33--114) in Thomas S. Deisboeck and J. Yasha Kresh
(eds.), Complex Systems Science in Biomedicine (NY: Springer,
2006), nlin.AO/0307015. A
summary of the tools people *should* use to study complex systems,
covering statistical learning and data-mining, time series analysis, cellular
automata, agent-based models, evaluation techniques and simulation, information
theory and complexity measures, with 288 references (a personal record).

[15] CRS and Kristina L. Klinkner, "Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences", pp. 504--511 in Max Chickering and Joseph Halpern (eds.), Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference, cs.LG/0406011 (Arlington, Virginia: AUAI Press, 2004). An eight-page paper on CSSR, including experimental comparison to the standard heuristic of fitting hidden Markov models via the EM algorithm, and selecting among them with cross-validation. (We're better.) I think this is the clearest description yet of the algorithm, though proofs were omitted to save space.

[16] CRS, Kristina L. Klinkner and Rob
Haslinger, "Quantifying Self-Organization with Optimal
Predictors", Physical Review Letters **93** (2004):
118701, nlin.AO/0409024.
Why self-organization should be identified with increasing complexity over
time, and how to measure that complexity by measuring the amount of information
needed for optimal prediction. This is the experiment I said I was going to do
at the end of "Is the Primordial Soup Done Yet?", after only eight years. But
one of those years we spent waiting for the referees to see the light, so it
doesn't count. With neat color pictures!

[17] "The Backwards Arrow of Time of the Coherently Bayesian Statistical
Mechanic", cond-mat/0410063.
Why we should *not* identify thermodynamic entropy with uncertainty
(Shannon entropy) in a distribution over microstates. Doing so, in combination
with the ordinary laws of motion and Bayesian probability updating, shows that
entropy is non-increasing. Replacing Bayesian updating with repeated entropy
maximization is bad statistics, and actually makes things worse.

[18] Michael
T. Gastner, CRS and M. E. J. "Mark" Newman,
"Maps and cartograms of the 2004 US presidential election
results", Advances in Complex Systems, **8** (2005):
117--123 [PDF
preprint, web page]. The only work I have ever
done, or likely will do, which generated hate mail.

[19] Kristina L. Klinkner, CRS
and Marcelo Camperi, "Measuring
Shared Information and Coordinated Activity in Neuronal Networks", pp. 667--674
in Yair Weiss, Bernhard Schölkopf and John C. Platt (eds.), Advances
in Neural Information Processing Systems 18 (MIT Press, 2006),
a.k.a. NIPS
2005, q-bio.NC/0506009.
The best way to measure those things is to look at the mutual information
between the causal states of the different neurons, suitably normalized: this
handles nonlinear, stochastic relationships between extended patterns of
behavior without any fuss or issues, and extends naturally to truly global
measures of coordination, not just pairwise averages. (Plus, we can find the
causal states with CSSR.) Because
practice is the sole criterion of truth, we also show that this "informational
coherence" works *very nicely* on a model of beta and gamma rhythms,
where standard approaches get confused.
(More.)

[20] Matthew J. Berryman, Scott W. Coussens, Yvonne Pamula, Declan Kennedy, Kurt Lushington, CRS, Andrew Allison, A. James Martin, David Saint and Derek Abbott, "Nonlinear Aspects of the EEG During Sleep in Children", pp. 40--48 in Nigel G. Stocks, Derek Abbott and Robert P. Morse (eds.), Fluctuations and Noise in Biological, Biophysical, and Biomedical Systems III (Bellingham, Washington: SPIE, 2005), q-bio.NC/0506015.

[21] "Functionalism, Emergence, and Collective Coordinates",
Behavioral and
Brain Sciences **27** (2004): 635--636. This is part
of the peer commentary (pp. 627--637, same issue) on Don Ross and David
Spurrett's "What to Say to a Skeptical Metaphysician: A Defense Manual for
Cognitive and Behavioral Scientists" (pp. 603--627), and is
followed by Ross and Spurrett's reply to comments (pp. 637--647). [PDF]

[22] CRS, Rob
Haslinger, Jean-Baptiste
Rouquier, Kristina L. Klinkner
and Cristopher Moore, "Automatic
Filters for the Detection of Coherent Structure in Spatiotemporal
Systems", Physical Review E **73** (2006):
036104, nlin.CG/0508001.
Two different filters --- one based on local perturbation, the other based on
statistical forecasting complexity --- which are both able to recover the known
coherent structures of various cellular automata, without using prior knowledge
of the rule or the structures. See
the auto-vulgarization for more, but not so
much more as the paper. (To meet the arxiv's requirements, we had to replace
our original figures with highly-compressed
jpegs. Here is a PDF version with the original,
full-resolution figures; it is, oddly enough, more than a megabyte smaller than
the arxiv-generated PDF.)
The source code is
available, in Objective Caml.

[23] CRS, Marcelo F. Camperi and Kristina L. Klinkner,
"Discovering Functional Communities in Dynamical
Networks", q-bio.NC/0609008
= pp. 140--157 in Anna
Goldenberg, Edo
Airoldi, Stephen
E. Fienberg, Alice
Zheng, David M. Blei and
Eric P. Xing
(eds.), Statistical Network Analysis: Models, Issues and New
Directions (New York: Springer-Verlag, 2007), the
proceedings of the ICML 2006 workshop
of the same name. How
the combination of informational coherence (see Klinkner *et al.* above)
and community discovery
algorithms let you identify *functional* modules, rather than just
anatomical ones. We just look at an example from computational neuroscience
here, but there is no intrinsic reason you couldn't do this for any kind of
network dynamical system.

[24] "Maximum Likelihood Estimation for *q*-Exponential (Tsallis)
Distributions", math.ST/0701854, submitted
to Physical Review E. Tsallis's
*q*-exponential distributions are heavy-tailed distributions which are
related to the Pareto distributions (in fact, a special case of what some
people call a "type II generalized Pareto"). They've recently become a big
deal in statistical
physics --- much too
big, if you ask me. But if you *do* want to use them with real
data, this is the right way to do it. The paper is accompanied
by free, open-source code, written
in R, implementing the maximum
likelihood estimator and bootstrapped standard errors and confidence intervals.

[25] Aaron Clauset, CRS,
and M. E. J. Newman,
"Power-law distributions in empirical
data", arxiv:0706.1062,
SIAM
Review **51** (2009): 661--703. What power laws are
(not just sort of straight lines on log-log plots), how to estimate their
parameters from data (use maximum likelihood, not linear regression), and how
to tell if you have one (by actual hypothesis testing).
With accompanying code
in R and Matlab. (More.)

[26] "Social Media as Windows on the Social Life of the Mind", arxiv:0710.4911, to appear in the AAAI spring 2008 symposium on social information processing. A programmatic paper, pulling together some ideas about cultural diffusion in networks and collective cognition, and how those topics might be studied with data from user-driven Web sites.

[27] Rob Haslinger, Kristina
L. Klinkner and CRS, "The Computational Structure of Spike
Trains", Neural
Computation **22** (2010):
121--157, arxiv:1001.0036.
Applying CSSR to real neural spike trains
to recover their computatinal structure and statistical complexity, and detect
the influence of external stimuli.

[28] "Dynamics of Bayesian Updating with Dependent Data and Misspecified
Models",
Electronic Journal of
Statistics **3** (2009):
1039--1074, arxiv:0901.1342.
What happens when all of your models are wrong, but you use Bayesian updating
to weight them anyway? Answer: a process of natural selection, in which
fitness is proportional to likelihood. This actually converges on the models
with the smallest relative entropy rate, if your prior distribution is smart
enough to respect a sieve (but then non-parametric non-Bayesian methods work
too). (More.)

[29] Shinsuke Koyama, Lucia
Castellanos Pérez-Bolde, CRS
and Robert E. Kass, "Approximate
Methods for State-Space
Models", Journal of
the American Statistical Association **105** (2010):
170--180, arxiv:1004.3476.
When modeling an observed time series as a noisy function of a hidden Markov
process, you need to estimate the state. This is called "filtering", and doing
it exactly involves some generally-intractable integrals. We approximate those
integrals via Laplace's method, giving us a "Laplace-Gaussian filter". This is
surprisingly fast, accurate, and stable over time, and works well in a neural
decoding example.

[30] CRS and Andrew C. Thomas, "Homophily
and Contagion Are Generically Confounded in Observational Social Network
Studies", Sociological
Methods and Research **40** (2011):
211--239, arxiv:1004.4704.
Individuals near each other in a social network tend to behave similarly; you
can predict what one of them will do from what their neighbors do. Is this
because they are influenced by their neighbors ("contagion"), or because social
ties tend to form between people who are already similar ("homophily"), and so
act alike, or some of both? We show that observational data can hardly ever
answer this question, unless accompanied by very strong assumptions, like
measuring everything that leads people to form social ties.
(More.)

[31] Andrew Gelman and
CRS, "Philosophy and the practice of Bayesian
statistics", British Journal of Mathematical and Statistical Psychology **66**
(2013): 8--38, arxiv:1006.3868
(with commentaries and response). What bugs me the most about many
presentations of Bayesianism is the pretense that it gives an automatic method
of induction, that all you need or should rationally want is the posterior
probability of your theory. For actual mortals, testing, checking and revising
your model remains essential, and this matches what good Bayesian data analysts
actually *do*, though their ideology discourages them from doing it.

[32] "Scaling and Hierarchy in Urban Economies", arxiv:1102.4101.

[33] Daniel J. McDonald, CRS and Mark Schervish, "Estimating beta-mixing coefficients", pp. 516--524 in the 14th Conference on Artificial Intelligence and Statistics (AISTATS 2011), arxiv:1103.0941

[34] Daniel J. McDonald, CRS and Mark Schervish, "Generalization error bounds for stationary autoregressive models", arxiv:1103.0942

[35] CRS, Abigail Z. Jacobs, Kristina L. Klinkner, and Aaron Clauset, "Adapting to Non-stationarity with Growing Expert Ensembles", arxiv:1103.0949

[36] Daniel J. McDonald, CRS and Mark Schervish, "Risk bounds for time series without strong mixing", arxiv:1106.0730

[37] CRS and Alessandro
Rinaldo, "Consistency under Sampling of Exponential Random Graph
Models", Annals of
Statistics **41** (2013): 508--535, arxiv:1111.3054. More.

[38] CRS and Aryeh (Leonid) Kontorovich, "Predictive PAC Learning and Process Decompositions", pp. 1619--1627 in NIPS 2013, arxiv:1309.4859

[39] Georg M. Goerg, CRS and Larry Wasserman, "Lebesgue Smoothing" (by request)

[40] Henry Farrell and CRS, "Selection, Evolution, and Rational Choice Institutionalism" (by request)

[41] Daniel J. McDonald, CRS and Mark Schervish, "Estimating Beta-Mixing Coefficients via Histograms",
Electronic Journal of
Statistics **9** (2015):
2855--2883, arxiv:1109.5998

[42] Daniel J. McDonald, CRS and Mark Schervish, "Estimated VC Dimension for Risk Bounds", arxiv:1111.3404. More.

[43] "Comment on `Why and When ``Flawed'' Social Network Analyses Still
Yield Valid Tests of No
Contagion", Statistics,
Politics, and Policy **3** (2012): 5
[PDF reprint]

[44] Georg M. Goerg and CRS, "LICORS: Light Cone Reconstruction of States for Non-parametric Forecasting of Spatio-Temporal Systems", arxiv:1206.2398

[45] Xiaoran Yan, CRS, Jacob E. Jensen, Florent Krzakala, Cristopher Moore, Lenka Zdeborova, Pan Zhang and Yaojia Zhu, "Model Selection for Degree-corrected Block Models", Journal of Statistical Mechanics 2014: P05007, arxiv:1207.3994

[46] Georg M. Goerg and CRS, "Mixed LICORS: A Nonparametric Algorithm for Predictive State Reconstruction", pp. 289--297 in AIStats 2013, arxiv:1211.3760

[47] Daniel J. McDonald, CRS, and Mark J. Schervish, "Time series forecasting: model evaluation and selection using nonparametric risk bounds", arxiv:1212.0463

[48] Leila Wehbe, Aaditya Ramdas, Rebecca C. Steorts and CRS, "Regularized
Brain Reading with Shrinkage and
Smoothing", Annals of
Applied Statistics **9** (2015):
1997--2022, arxiv:1401.6595

[49] Dena Asta and CRS, "Geometric Network Comparison", pp. 102--110 in UAI 2015, arxiv:1411.1350

[50] George D. Montanez and CRS, "The LICORS Cabinet: Nonparametric Algorithms for Spatio-temporal Prediction", arxiv:1506.02686

[51] Christopher N. Warren, Daniel Shore, Jessica Otis, Lawrence Wang, Mike
Finegold and CRS, "Six Degrees of Francis Bacon: A Statistical Method for
Reconstructing Large Historical Social
Networks", Digital
Humanities Quarterly **10:3** (2016)

[52] CRS and Edward McFowland III, "Controlling for Latent Homophily in Social Networks through Inferring Latent Locations", arxiv:1607.06565