CSSR

An Algorithm for Building Markov Models from Time Series

This is the homepage for CSSR (Causal State Splitting Reconstruction), an algorithm for building recursive hidden Markov models from discrete-valued time series, and other discrete sequential data. Unlike conventional hidden-Markov algorithms, which tune the parameters in a fixed architecture, CSSR is capable of inferring the appropriate architecture as well. In fact, under certain conditions, CSSR will converge on the optimal model of the underlying data-generating process.

CSSR has been implemented and successfully tested on a range of different discrete time series and sequential data streams. This page provides the full source code, released under the GNU Public License, documentation, and links to scientific papers using the CSSR algorithm.

The theory behind the algorithm was developed by Cosma Shalizi and Kristina Klinkner, under the sponsorship of James Crutchfield. The code for this implementation was written by Kristina Klinkner. For full credits, see the comments to the code, and the paper "Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences".

What's Available

CSSR-v0.1.1.tar.gz (268k) is the complete C++ source code for CSSR versions 0.1.1, plus documentation and license. The last update was on 2 May 2008.

As mentioned, CSSR is released under the GNU Public License. You may want to read the Copyright, license and warranty statement (plain text, 16k) before downloading. (A copy is included in the distribution.) Please note especially that CSSR is provided with ABSOLUTELY NO WARRANTY.

The ReadMe file is available in gzipped Postscript, PDF, and HTML. (All these versions, plus plain ASCII, are included in the distribution.) The documentation was last updated on 2 May 2008. The ReadMe gives a brief overview of what CSSR does and how it works, describes usage, offers some hints on parameters, and explains the known issues with this implementation of the basic algorithm.

There is a thorough explanation of the theory behind CSSR in the paper "Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences" (Cosma Rohilla Shalizi and Kristina Lisa Shalizi, pp. 504--511 in Max Chickering and Joseph Halpern (eds.), Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (Arlington, Virginia: AUAI Press, 2004 = arxiv:cs.LG/0406011). For reasons of space, that paper omits certain details, which can be found in the technical report "An Algorithm for Pattern Discovery in Time Series" (Cosma Rohilla Shalizi, Kristina Lisa Shalizi and James P. Crutchfield, Santa Fe Institute Working Paper 02-10-060 = arxiv:cs.LG/0210025).

Installing CSSR

The following instructions are for Unix-based systems, and are included in the ReadMe file.

Download CSSR-v0.1.1.tar.gz. When gunzipped and untarred, this will produce a directory called CSSR-v0.1.1, containing all the necessary header and source code files, a copy of the documentation, the release license, and a make file. Running make inside that directory will produce an executable, which should be moved to someplace in command path. On most Unix systems, the following sequence of commands will create the executable and put it in the your bin directory, usually part of your command path.

gunzip CSSR-v0.1.1.tar.gz
tar xvf CSSR-v0.1.1.tar
cd CSSR-v0.1.1
make
cp CSSR ~/bin/

The code has been successfully compiled with gcc 3.1, 3.3 and 4.0 on Macintosh OS X (10.2, 10.3 and 10.4, respectively), with gcc 3.2 on Linux (Red Hat 9), and with Microsoft Visual C++ on Windows 98 and Windows XP. (If using Visual C++, be sure to set the project type to Windows Console Application.) If you get CSSR to work on a different system, please let us know.

On some systems, compilation may produce warnings about escape sequences or the use of deprecated headers. These can be safely ignored.

Publications on CSSR

As mentioned, the principal paper on CSSR is "Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences", arxiv:cs.LG/0406011. This describes the algorithm and its theoretical basis in detail, gives results on time complexity and on the convergence on the reconstructed model, and compares its performance to other approaches, including the EM algorithm and cross-validation. Some details omitted from the "Blind Construction" paper can be found in the preliminary technical report, "An Algorithm for Pattern Discovery in Time Series". The latest version at arxiv.org, arxiv:cs.LG/0210025, supersedes all earlier drafts.

Citing CSSR in Your Publications

If you use CSSR in a scientific publication, please cite the "Blind Construction" paper. A sample BibTeX entry:

@inproceedings{CSSR-UAI-2004,
  author = "Cosma Rohilla Shalizi and Kristina Lisa Klinkner",
  title = "Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences",
  editor = "Max Chickering and Joseph Y. Halpern",
  booktitle = "Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (UAI 2004)",
  publisher = "AUAI Press",
  address = "Arlington, Virginia",
  year = 2004,
  pages = "504--511",
  url = "http://arxiv.org/abs/cs.LG/0406011"}

Publications Using CSSR

Alphabetical by author.

Fabio Boschetti
"Mapping the complexity of ecological models", Ecological Complexity forthcoming (2007) [PDF]
Nicolas Brodu
"Quantifying the Effect of Learning on Recurrent Spiking Neurons", forthcoming in International Joint Conference on Neural Networks [IJCNN] 2007 [PDF]
Jean-Philippe Cointet, Emmanuel Faure and Camille Roth
"Intertemporal topic correlations in online media", in Proceedings of the International Conference on Weblogs and Social Media [ICWSM] (Boulder, CO, USA: 2007) [PDF]
Davis S. Friedlander, Isanu Chattopadhayay, Asok Ray, Shashi Phoha and Noah Jacobson
"Anomaly Prediction in Mechanical System Using Symbolic Dynamics", in Proceedings of the 2003 American Control Conference, Denver, CO, 4--6 June 2003 (2003)
David S. Friedlander, Shashi Phoha and Richard Brooks
"Determination of Vehicle Behavior based on Distributed Sensor Network Data", in Franklin T. Luk (ed.), Advanced Signal Processing Algorithms, Architectures, and Implementations XIII (Bellingham, WA: SPIE, 2003)
Kristina Lisa Klinkner, Cosma Rohilla Shalizi and Marcelo F. Camperi
"Measuring Shared Information and Coordinated Activity in Neuronal Networks", pp. 667--674 in Yair Weiss, Bernhard Sch¨lkopf and John C. Platt (eds.), Advances in Neural Information Processing Systems 18 [NIPS 2005] (Cambridge, MA: MIT Press, 2006) = arxiv:q-bio.NC/0506009
Dmitry Nerukh, George Karvounis and Robert C. Glen
"Complexity of classical dynamics of molecular systems. I. Methodology", Journal of Chemical Physics 117 (2002): 9611--9617 [PDF]
"Complexity of classical dynamics of molecular systems. II. Finite statistical complexity of a water--Na+ system", Journal of Chemical Physics 117 (2002): 9618--9622 [PDF]
"Quantifying the Complexity of Chaos in Multibasin Multidimensional Dynamics of Molecular Systems", Complexity 10:2 (2004): 40--46 [PDF]
"Dynamic Complexity of Chaotic Transitions in High-Dimensional Classical Dynamics: Leu-Enkephalin Folding", in Computational Life Sciences II, pp. 129--140 (Berlin: Springer Verlag, 2006) [PDF]
Muntsa Padró and Lluís Padró
"Applying a Finite Automata Acquisition Algorithm to Named Entity Recognition", in Proceedings of 5th International Workshop on Finite-State Methods and Natural Language Processing [FSMNLP'05] (2005) [PDF]
"A Named Entity Recognition System Based on a Finite Automata Acquisition Algorithm", Procesamiento del Lenguaje Natural 35 (2005): 319--326 [PDF]
"Approaching Sequential NLP Tasks with an Automata Acquisition Algorithm", in Proceedings of International Conference on Recent Advances in NLP [RANLP'05] (2005) [PDF]
"ME-CSSR: an Extension of CSSR using Maximum Entropy Models", in Proceedings of Finite State Methods for Natural Language Processing (FSMNLP) 2007 [PDF]
"Studying CSSR Algorithm Applicability on NLP Tasks", Procesamiento del Lenguaje Natural 39 (2007): 89--96 [PDF]
Joongwoo Brian Park, Jeong Won Lee, Jae-Suk Yang, Hang-Hyun Jo and Hie-Tae Moon
"Complexity analysis of the stock market", Physica A 379 (2007): 179--187 = arxiv:physics/0607283
Asok Ray
"Symbolic dynamic analysis of complex systems for anomaly detection", Signal Processing 84 (2004): 1115--1130
Cosma Rohilla Shalizi, Marcelo F. Camperi and Kristina Lisa Klinkner
"Discovering Functional Communities in Dynamical Networks", pp. 140--157 in Edo Airoldi et al. (eds.), Statistical Network Analysis: Models, Issues and New Directions [proceedings of an ICML 2006 workshop] (New York: Springer-Verlag, 2007) = arxiv:q-bio.NC/0609008
Dowman P. Varn and James P. Crutchfield
"From Finite to Infinite Range Order via Annealing: The Causal Architecture of Deformation Faulting in Annealed Close-Packed Crystals", Physics Letters A 324 (2004): 299--307 = arxiv:cond-mat/0307296

Contact Information

Send e-mail to Cosma Shalizi or Kristina Klinkner.

Please write us to let us know if you use CSSR in an application (especially a paper), compile it on a new platform, modify it, wish to be kept informed of future developments, or have a bug or other problem to report. However, before sending bug reports, please read the documentation carefully, and if possible the paper too, so that you're sure what you're experiencing is a bug. Since CSSR is released under the Gnu Public License, feel free to write your own fix for the bug, and tell us about it!


Created 7 August 2003 by CRS