Cosma

Research

Changing my shape
I feel like an accident

I work on methods for building predictive models from data generated by stochastic processes, and applying those models to questions about neural information processing, self-organization in cellular automata, and so forth. All of this is about using tools from probability, statistics, and machine learning to understand large, complex, nonlinear dynamical systems. This is why my dissertation was in statistical physics, but I now teach in a statistics department.

My departmental homepage has a fuller explanation of what I do, and how I came to do it, in terms which (I hope) make sense to statisticians.

[Summaries]   [Talks and Presentations]   [Papers and Book Chapters]   [Lecture Notes]

Summaries

Curriculum Vitae/Resume: Lists my publications, talks given, and other professional data.

My doctoral dissertation, Causal Architecture, Complexity and Self-Organization in Time Series and Cellular Automata (2001). A unified presentation of the material from my previous papers on computational mechanics, plus several chapters of until-then unpublished work. Includes quantitative definitions of "emergence" and "self-organization", of which I am fairly fond.

Papers and Book Chapters

In order of completion. Lecture notes are on my teaching page.

[1] Jim Crutchfield & CRS, "Thermodynamic Depth of Causal States: Objective Complexity via Minimal Representation", Physical Review E 59 (1999): 275--283 (PDF), cond-mat/9808147. It explains why thermodynamic depth, a notion advanced by the late, great Heinz Pagels, while nifty in concept and certainly not just Yet Another Complexity Measure, doesn't work very well as originally proposed, and what needs to be changed to make it work, namely adding causal states. (In the words of the poet, "we subsume what we do not obliterate.")

[2] Cris Moore, Mats Nordahl, Nelson Minar & CRS, "Entropic Coulomb Forces in Ising and Potts Antiferromagnets and Ice Models," cond-mat/9902200; final published version, "Vortex Dynamics and Entropic Forces in Antiferromagnets and Antiferromagnetic Potts Models," Physical Review E 60 (1999): 5344--5351. (To be honest, I didn't write any of this, but did a large chunk of the simulation work.) Only of interest to people who care about pretty abstract models in statistical mechanics.

[3] Jim Crutchfield, Dave Feldman & CRS, "Comment on 'Simple Measure for Complexity,' " chao-dyn/9907001 = Physical Review E 62 (2000): 2996--2997. Short, critical remarks on Yet Another Complexity Measure.

[4] CRS & Jim Crutchfield, "Computational Mechanics: Pattern and Prediction, Structure and Simplicity," Journal of Statistical Physics 104 (2001): 816--879 (PDF), cond-mat/9907176. I'll let Mathematical Reviews describe this one for me:

The main concern of this article, written in a very general setting, is how to maximally compress the past of a random process still keeping all the useful information, so as to predict its future as well as if all the past were remembered. In this connection the authors introduce "causal states" into which all the space of possible past histories is cut. Within each causal state all the individual histories have one and the same conditional distribution for future observables. An epsilon-machine is defined which uses the causal states for prediction. The authors prove several theorems about them, whose intuitive meaning is natural: that causal states are maximally prescient, that they have the minimal statistical complexity among all prescient rivals, that they are unique, that epsilon-machines have the minimal entropy among all the prescient rivals and some inequalities. This voluminous article, containing 140 references, may also be used as a survey in the area of abstract theory of computational complexity of prediction.

[5] CRS & Bill Tozier, "A Simple Model of the Evolution of Simple Models of Evolution," adap-org/9910002; accepted by JWAS; rejected by Theoretical Population Biology for lack of decorum. A not entirely unserious critique of recent attempts to model evolution by physicists who don't know biology.

[6] CRS & Jim Crutchfield, "Pattern Discovery and Computational Mechanics," cs.LG/0001027. Why people who are interested in machine learning should care about what we do. Amicably rejected by the Proceedings of the 17th International Conference on Machine Learning, with remarks on the order of "interesting, but you really need to say more about how to code it up," which was fair enough, and provided some of the impetus for "An Algorithm for Pattern Discovery" and "Blind Construction" (below).

[7] CRS & Jim Crutchfield, "Information Bottlenecks, Causal States, and Statistical Relevance Bases: How to Represent Relevant Information in Memoryless Transduction," nlin.AO/0006025. Discussion of several related ways of extracting the information one variable contains about another, and using it to model the functional relationship or transducer connecting the two. Advances in Complex Systems 5 (2002): 91--95.

[8] Wim Hordijk, CRS & Jim Crutchfield, "Upper Bound on the Products of Particle Interactions in Cellular Automata", Physica D 154 (2001): 240--258, nlin.CG/0008038. A proof of a limit on how complicated the interactions between propagating emergent structures in (one-dimensional) CAs can get. Wim is too modest to call it Hordijk's Rule, so I will.

[9] CRS and Dave Albers, "Symbolic Dynamics for Discrete Adaptive Games," cond-mat/0207407, SFI Working Paper 02-07-031. Why the hyper-stylized game-theory models which have taken over econophysics in the last few years are not complex, dynamically or statistically. (We were going to call it "no chaos and little complexity in the minority game," but settled on something more neutral.) Which isn't to say they're not worth studying; just that they need to justify themselves by what they can tell us about real(er) systems. Submitted to Physics Letters A. (Update, 2003: Revised to placate an unusually appalling referee.)

[10] CRS, Kristina L. Klinkner and Jim Crutchfield, "An Algorithm for Pattern Discovery in Time Series," cs.LG/0210025. (This version supersedes the SFI Working Paper one.) A statistically reliable, linear-time algorithm for inferring causal states from data. The code and documentation are available, released under the Gnu Public License. I'd recommend reading the "Blind Construction" paper first, since I think that has a clearer presentation of the algorithm and its motivation.

[11] CRS and Cris Moore, "What Is a Macrostate? Subjective Measurements and Objective Dynamics," cond-mat/0303625; also PITT-PHIL-SCI-1119 at the Phil-Sci Archive. Why thermodynamic macrostates are neither completely objective nor (as some argue) completely epistemic, but are instead causal states, in the sense of computational mechanics. Submitted to Studies in the History and Philosophy of Modern Physics.

[12] CRS and Kristina L. Klinkner, "Quantifying Self-Organization in Cyclic Cellular Automata," pp. 108--117 in Lutz Schimansky-Geier, Derek Abbott, Alexander Neiman and Christian Van den Broeck (eds.), Noise in Complex Systems and Stochastic Dynamics (Bellingham, Washington: SPIE, 2003), part of the proceedings of Fluctuations and Noise 2003. A preliminary report on the work that became "Quantifying Self-Organization with Optimal Predictors", below, which however has more details on the algorithm and related literature, because we were less space-constrained. nlin.AO/0507067.

[13] "Optimal Nonlinear Prediction of Random Fields on Networks," for the conference Discrete Models for Complex Systems 2003, printed in Discrete Mathematics and Theoretical Computer Science, AB(DMCS) (2003): 11--30. Available online from either the journal or arxiv.org (math.PR/0305160).

[14] "Methods and Techniques of Complex Systems Science: An Overview", chapter 1 (pp. 33--114) in Thomas S. Deisboeck and J. Yasha Kresh (eds.), Complex Systems Science in Biomedicine (NY: Springer, 2006), nlin.AO/0307015. A summary of the tools people should use to study complex systems, covering statistical learning and data-mining, time series analysis, cellular automata, agent-based models, evaluation techniques and simulation, information theory and complexity measures, with 288 references (a personal record).

[15] CRS and Kristina L. Klinkner, "Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences", pp. 504--511 in Max Chickering and Joseph Halpern (eds.), Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference, cs.LG/0406011 (Arlington, Virginia: AUAI Press, 2004). An eight-page paper on CSSR, including experimental comparison to the standard heuristic of fitting hidden Markov models via the EM algorithm, and selecting among them with cross-validation. (We're better.) I think this is the clearest description yet of the algorithm, though proofs were omitted to save space.

[16] CRS, Kristina L. Klinkner and Rob Haslinger, "Quantifying Self-Organization with Optimal Predictors", Physical Review Letters 93 (2004): 118701, nlin.AO/0409024. Why self-organization should be identified with increasing complexity over time, and how to measure that complexity by measuring the amount of information needed for optimal prediction. This is the experiment I said I was going to do at the end of "Is the Primordial Soup Done Yet?", after only eight years. But one of those years we spent waiting for the referees to see the light, so it doesn't count. With neat color pictures!

[17] "The Backwards Arrow of Time of the Coherently Bayesian Statistical Mechanic", cond-mat/0410063. Why we should not identify thermodynamic entropy with uncertainty (Shannon entropy) in a distribution over microstates. Doing so, in combination with the ordinary laws of motion and Bayesian probability updating, shows that entropy is non-increasing. Replacing Bayesian updating with repeated entropy maximization is bad statistics, and actually makes things worse.

[18] Michael T. Gastner, CRS and M. E. J. "Mark" Newman, "Maps and cartograms of the 2004 US presidential election results", Advances in Complex Systems, 8 (2005): 117--123 [PDF preprint, web page]. The only work I have ever done, or likely will do, which generated hate mail.

[19] Kristina L. Klinkner, CRS and Marcelo Camperi, "Measuring Shared Information and Coordinated Activity in Neuronal Networks", pp. 667--674 in Yair Weiss, Bernhard Schölkopf and John C. Platt (eds.), Advances in Neural Information Processing Systems 18 (MIT Press, 2006), a.k.a. NIPS 2005, q-bio.NC/0506009. The best way to measure those things is to look at the mutual information between the causal states of the different neurons, suitably normalized: this handles nonlinear, stochastic relationships between extended patterns of behavior without any fuss or issues, and extends naturally to truly global measures of coordination, not just pairwise averages. (Plus, we can find the causal states with CSSR.) Because practice is the sole criterion of truth, we also show that this "informational coherence" works very nicely on a model of beta and gamma rhythms, where standard approaches get confused. (More.)

[20] Matthew J. Berryman, Scott W. Coussens, Yvonne Pamula, Declan Kennedy, Kurt Lushington, CRS, Andrew Allison, A. James Martin, David Saint and Derek Abbott, "Nonlinear Aspects of the EEG During Sleep in Children", pp. 40--48 in Nigel G. Stocks, Derek Abbott and Robert P. Morse (eds.), Fluctuations and Noise in Biological, Biophysical, and Biomedical Systems III (Bellingham, Washington: SPIE, 2005), q-bio.NC/0506015.

[21] "Functionalism, Emergence, and Collective Coordinates", Behavioral and Brain Sciences 27 (2004): 635--636. This is part of the peer commentary (pp. 627--637, same issue) on Don Ross and David Spurrett's "What to Say to a Skeptical Metaphysician: A Defense Manual for Cognitive and Behavioral Scientists" (pp. 603--627), and is followed by Ross and Spurrett's reply to comments (pp. 637--647). [PDF]

[22] CRS, Rob Haslinger, Jean-Baptiste Rouquier, Kristina L. Klinkner and Cristopher Moore, "Automatic Filters for the Detection of Coherent Structure in Spatiotemporal Systems", Physical Review E 73 (2006): 036104, nlin.CG/0508001. Two different filters --- one based on local perturbation, the other based on statistical forecasting complexity --- which are both able to recover the known coherent structures of various cellular automata, without using prior knowledge of the rule or the structures. See the auto-vulgarization for more, but not so much more as the paper. (To meet the arxiv's requirements, we had to replace our original figures with highly-compressed jpegs. Here is a PDF version with the original, full-resolution figures; it is, oddly enough, more than a megabyte smaller than the arxiv-generated PDF.) The source code is available, in Objective Caml.

[23] CRS, Marcelo F. Camperi and Kristina L. Klinkner, "Discovering Functional Communities in Dynamical Networks", q-bio.NC/0609008 = pp. 140--157 in Anna Goldenberg, Edo Airoldi, Stephen E. Fienberg, Alice Zheng, David M. Blei and Eric P. Xing (eds.), Statistical Network Analysis: Models, Issues and New Directions (New York: Springer-Verlag, 2007), the proceedings of the ICML 2006 workshop of the same name. How the combination of informational coherence (see Klinkner et al. above) and community discovery algorithms let you identify functional modules, rather than just anatomical ones. We just look at an example from computational neuroscience here, but there is no intrinsic reason you couldn't do this for any kind of network dynamical system.

[24] "Maximum Likelihood Estimation for q-Exponential (Tsallis) Distributions", math.ST/0701854, submitted to Physical Review E. Tsallis's q-exponential distributions are heavy-tailed distributions which are related to the Pareto distributions (in fact, a special case of what some people call a "type II generalized Pareto"). They've recently become a big deal in statistical physics --- much too big, if you ask me. But if you do want to use them with real data, this is the right way to do it. The paper is accompanied by free, open-source code, written in R, implementing the maximum likelihood estimator and bootstrapped standard errors and confidence intervals.

[25] Aaron Clauset, CRS, and M. E. J. Newman, "Power-law distributions in empirical data", arxiv:0706.1062, SIAM Review 51 (2009): 661--703. What power laws are (not just sort of straight lines on log-log plots), how to estimate their parameters from data (use maximum likelihood, not linear regression), and how to tell if you have one (by actual hypothesis testing). With accompanying code in R and Matlab. (More.)

[26] "Social Media as Windows on the Social Life of the Mind", arxiv:0710.4911, to appear in the AAAI spring 2008 symposium on social information processing. A programmatic paper, pulling together some ideas about cultural diffusion in networks and collective cognition, and how those topics might be studied with data from user-driven Web sites.

[27] Rob Haslinger, Kristina L. Klinkner and CRS, "The Computational Structure of Spike Trains", Neural Computation 22 (2010): 121--157, arxiv:1001.0036. Applying CSSR to real neural spike trains to recover their computatinal structure and statistical complexity, and detect the influence of external stimuli.

[28] "Dynamics of Bayesian Updating with Dependent Data and Misspecified Models", Electronic Journal of Statistics 3 (2009): 1039--1074, arxiv:0901.1342. What happens when all of your models are wrong, but you use Bayesian updating to weight them anyway? Answer: a process of natural selection, in which fitness is proportional to likelihood. This actually converges on the models with the smallest relative entropy rate, if your prior distribution is smart enough to respect a sieve (but then non-parametric non-Bayesian methods work too). (More.)

[29] Shinsuke Koyama, Lucia Castellanos Pérez-Bolde, CRS and Robert E. Kass, "Approximate Methods for State-Space Models", Journal of the American Statistical Association 105 (2010): 170--180, arxiv:1004.3476. When modeling an observed time series as a noisy function of a hidden Markov process, you need to estimate the state. This is called "filtering", and doing it exactly involves some generally-intractable integrals. We approximate those integrals via Laplace's method, giving us a "Laplace-Gaussian filter". This is surprisingly fast, accurate, and stable over time, and works well in a neural decoding example.

[30] CRS and Andrew C. Thomas, "Homophily and Contagion Are Generically Confounded in Observational Social Network Studies", Sociological Methods and Research 40 (2011): 211--239, arxiv:1004.4704. Individuals near each other in a social network tend to behave similarly; you can predict what one of them will do from what their neighbors do. Is this because they are influenced by their neighbors ("contagion"), or because social ties tend to form between people who are already similar ("homophily"), and so act alike, or some of both? We show that observational data can hardly ever answer this question, unless accompanied by very strong assumptions, like measuring everything that leads people to form social ties. (More.)

[31] Andrew Gelman and CRS, "Philosophy and the practice of Bayesian statistics", British Journal of Mathematical and Statistical Psychology 66 (2013): 8--38, arxiv:1006.3868 (with commentaries and response). What bugs me the most about many presentations of Bayesianism is the pretense that it gives an automatic method of induction, that all you need or should rationally want is the posterior probability of your theory. For actual mortals, testing, checking and revising your model remains essential, and this matches what good Bayesian data analysts actually do, though their ideology discourages them from doing it.

[32] "Scaling and Hierarchy in Urban Economies", arxiv:1102.4101.

[33] Daniel J. McDonald, CRS and Mark Schervish, "Estimating beta-mixing coefficients", pp. 516--524 in the 14th Conference on Artificial Intelligence and Statistics (AISTATS 2011), arxiv:1103.0941

[34] Daniel J. McDonald, CRS and Mark Schervish, "Generalization error bounds for stationary autoregressive models", arxiv:1103.0942

[35] CRS, Abigail Z. Jacobs, Kristina L. Klinkner, and Aaron Clauset, "Adapting to Non-stationarity with Growing Expert Ensembles", arxiv:1103.0949

[36] Daniel J. McDonald and CRS "Rademacher complexity of stationary sequences", arxiv:1106.0730

[37] CRS and Alessandro Rinaldo, "Consistency under Sampling of Exponential Random Graph Models", Annals of Statistics 41 (2013): 508--535, arxiv:1111.3054. More.

[38] CRS and Aryeh (Leonid) Kontorovich, "Predictive PAC Learning and Process Decompositions", pp. 1619--1627 in NIPS 2013, arxiv:1309.4859

[39] Georg M. Goerg, CRS and Larry Wasserman, "Lebesgue Smoothing" (by request)

[40] Henry Farrell and CRS, "Selection, Evolution, and Rational Choice Institutionalism" (by request)

[41] Daniel J. McDonald, CRS and Mark Schervish, "Estimating Beta-Mixing Coefficients via Histograms", Electronic Journal of Statistics 9 (2015): 2855--2883, arxiv:1109.5998

[42] Daniel J. McDonald, CRS and Mark Schervish, "Estimated VC Dimension for Risk Bounds", arxiv:1111.3404. More.

[43] "Comment on 'Why and When "Flawed" Social Network Analyses Still Yield Valid Tests of No Contagion'", Statistics, Politics, and Policy 3 (2012): 5 [PDF reprint]

[44] Georg M. Goerg and CRS, "LICORS: Light Cone Reconstruction of States for Non-parametric Forecasting of Spatio-Temporal Systems", arxiv:1206.2398

[45] Xiaoran Yan, CRS, Jacob E. Jensen, Florent Krzakala, Cristopher Moore, Lenka Zdeborova, Pan Zhang and Yaojia Zhu, "Model Selection for Degree-corrected Block Models", Journal of Statistical Mechanics 2014: P05007, arxiv:1207.3994

[46] Georg M. Goerg and CRS, "Mixed LICORS: A Nonparametric Algorithm for Predictive State Reconstruction", pp. 289--297 in AIStats 2013, arxiv:1211.3760

[47] Daniel J. McDonald, CRS, and Mark J. Schervish, "Nonparametric Risk Bounds for Time-Series Forecasting", Journal of Machine Learning Research 18:32 (2017): 1--40, arxiv:1212.0463

[48] Leila Wehbe, Aaditya Ramdas, Rebecca C. Steorts and CRS, "Regularized Brain Reading with Shrinkage and Smoothing", Annals of Applied Statistics 9 (2015): 1997--2022, arxiv:1401.6595

[49] Dena Asta and CRS, "Geometric Network Comparison", pp. 102--110 in UAI 2015, arxiv:1411.1350

[50] George D. Montanez and CRS, "The LICORS Cabinet: Nonparametric Algorithms for Spatio-temporal Prediction", International Joint Conference on Neural Networks [IJCNN 2017], arxiv:1506.02686 [Winner of the Best Student Paper and Best Poster awards]

[51] Christopher N. Warren, Daniel Shore, Jessica Otis, Lawrence Wang, Mike Finegold and CRS, "Six Degrees of Francis Bacon: A Statistical Method for Reconstructing Large Historical Social Networks", Digital Humanities Quarterly 10:3 (2016) --- and Six Degrees of Francis Bacon website

[52] Edward McFowland III and CRS, "Estimating Causal Peer Influence in Homophilous Social Networks by Inferring Latent Locations", Journal of the American Statistical Association forthcoming (2022), arxiv:1607.06565

[53] Neil Spencer and CRS, "Projective Sparse Latent Space Network Models", arxiv:1709.09702

[54] Alden Green and CRS, "Bootstrapping Exchangeable Random Graphs", Electronic Journal of Statistics 16 (2022): 1058--1905, arxiv:1711.00813

[55] CRS and Dena Asta, "Consistency of Maximum Likelihood for Continuous-Space Network Models, Part I", Electronic Journal of Statistics forthcoming (2023), arxiv:1711.02123

[56] Robert Lunde and CRS, "Bootstrapping Generalization Error Bounds for Time Series", arxiv:1711.02834

[57] Octavio César Mesner, Alex Davis, Elizabeth Casman, Hyagriv Simhan, CRS, Lauren Keenan-Devlin, Ann Borders and Tamar Krishnamurti, "Using graph learning to understand adverse pregnancy outcomes and stress pathways", PLoS One 14 (2019): e0223319

[58] Octavio César Mesner and CRS, "Conditional Mutual Information Estimation for Mixed Discrete and Continuous Variables with Nearest Neighbors", IEEE Transactions on Information Theory 67 (2021): 464--484, arxiv:1912.03387

[59] CRS, "A Note on Simulation-Based Inference by Matching Random Features", arxiv:2111.09220. Some self-exposition.

[60] CRS, "Evaluating Posterior Distributions by Selectively Breeding Prior Samples", arxiv:2203.09077

[61] CRS, "A Simple Non-Stationary Mean Ergodic Theorem, with Bonus Weak Law of Large Numbers", arxiv:2203.09085

[62] William D. Fahy, CRS and Ryan Christopher Sullivan, "A universally applicable method of calculating confidence bands for ice nucleation spectra derived from droplet freezing experiments", Atmospheric Measurement Techniques 15 (2022): 6819--6836

[63] Sabina J. Sloman, Daniel M. Oppenheimer, Stephen B. Broomell, and CRS, "Characterizing the robustness of Bayesian adaptive experimental designs to active learning bias", arxiv:2205.13698

[64] Daniel J. McDonald and CRS, "Empirical Macroeconomics and DSGE Modeling in Statistical Perspective", arxiv:2210.16224. Some self-exposition.

[65] Henry Farrell and CRS, "Bias, Skew and Search Engines Are Sufficient to Explain Online Toxicity", Communications of the ACM forthcoming (2023)