CSSR
An Algorithm for Building Markov Models from Time Series
This is the homepage for CSSR (Causal State Splitting Reconstruction), an
algorithm for building recursive hidden Markov models from discrete-valued time
series, and other discrete sequential data. Unlike conventional hidden-Markov
algorithms, which tune the parameters in a fixed architecture, CSSR is capable
of inferring the appropriate architecture as well. In fact, under certain
conditions, CSSR will converge on the optimal model of the underlying
data-generating process.
CSSR has been implemented and successfully tested on a range of different
discrete time series and sequential data streams. This page provides the full
source code, released under the GNU Public License, documentation,
and links to scientific papers using the CSSR algorithm.
The theory behind the algorithm was developed by Cosma Shalizi and Kristina Klinkner, under
the sponsorship of James
Crutchfield. The code for this implementation was written by Kristina
Klinkner. For full credits, see the comments to the code, and the paper "Blind
Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences".
What's Available
The C++ source code, plus documentation and license, is now on Github. The documentation
gives a brief overview of what
CSSR does and how it works, describes usage, offers some hints on parameters,
and explains the known issues with this implementation of the basic
algorithm.
Please note especially that CSSR is provided with ABSOLUTELY NO
WARRANTY.
There is a thorough explanation of the theory behind CSSR in the paper
"Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete
Sequences" (Cosma Rohilla Shalizi and Kristina Lisa Shalizi, pp. 504--511 in
Max Chickering and Joseph Halpern (eds.), Uncertainty in Artificial
Intelligence: Proceedings of the Twentieth Conference (Arlington,
Virginia: AUAI Press, 2004 =
arxiv:cs.LG/0406011). For
reasons of space, that paper omits certain details, which can be found in the
technical report "An Algorithm for Pattern Discovery in Time Series" (Cosma
Rohilla Shalizi, Kristina Lisa Shalizi and James P. Crutchfield, Santa Fe
Institute Working Paper 02-10-060
= arxiv:cs.LG/0210025).
Publications on CSSR
As mentioned, the principal paper on CSSR is "Blind Construction of Optimal
Nonlinear Recursive Predictors for Discrete
Sequences", arxiv:cs.LG/0406011. This
describes the algorithm and its theoretical basis in detail, gives results on
time complexity and on the convergence on the reconstructed model, and compares
its performance to other approaches, including the EM algorithm and
cross-validation. Some details omitted from the "Blind Construction" paper can
be found in the preliminary technical report, "An Algorithm for Pattern
Discovery in Time Series". The latest version at
arxiv.org, arxiv:cs.LG/0210025, supersedes
all earlier drafts.
Citing CSSR in Your Publications
If you use CSSR in a scientific publication, please cite
the "Blind Construction" paper. A sample BibTeX entry:
@inproceedings{CSSR-UAI-2004,
author = "Cosma Rohilla Shalizi and Kristina Lisa Klinkner",
title = "Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences",
editor = "Max Chickering and Joseph Y. Halpern",
booktitle = "Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (UAI 2004)",
publisher = "AUAI Press",
address = "Arlington, Virginia",
year = 2004,
pages = "504--511",
url = "http://arxiv.org/abs/cs.LG/0406011"}
Publications Using CSSR
Alphabetical by author. (Last updated 2008.)
- Fabio Boschetti
- "Mapping the complexity of ecological models", Ecological Complexity forthcoming (2007) [PDF]
- Nicolas Brodu
- "Quantifying the Effect of Learning on Recurrent Spiking Neurons",
forthcoming in International Joint Conference on Neural Networks [IJCNN]
2007
[PDF]
- Jean-Philippe Cointet, Emmanuel Faure and Camille Roth
- "Intertemporal topic correlations in online media", in Proceedings of the International Conference on Weblogs and Social Media [ICWSM] (Boulder, CO, USA: 2007) [PDF]
- Davis S. Friedlander, Isanu Chattopadhayay, Asok Ray, Shashi Phoha and Noah
Jacobson
- "Anomaly Prediction in Mechanical System Using Symbolic Dynamics",
in Proceedings of the 2003 American Control Conference, Denver, CO, 4--6
June 2003 (2003)
- David S. Friedlander, Shashi Phoha and Richard Brooks
- "Determination of Vehicle Behavior based on Distributed Sensor Network Data", in Franklin T. Luk (ed.), Advanced Signal Processing Algorithms, Architectures, and Implementations XIII (Bellingham, WA: SPIE, 2003)
- Kristina Lisa Klinkner, Cosma Rohilla Shalizi and Marcelo F. Camperi
- "Measuring Shared Information and Coordinated Activity in Neuronal
Networks", pp. 667--674 in Yair Weiss, Bernhard Sch¨lkopf and John C. Platt
(eds.), Advances in Neural Information Processing Systems 18 [NIPS
2005] (Cambridge, MA: MIT Press, 2006)
= arxiv:q-bio.NC/0506009
- Dmitry Nerukh, George Karvounis and Robert C. Glen
- "Complexity of classical dynamics of molecular systems.
I. Methodology", Journal of Chemical Physics 117
(2002): 9611--9617
[PDF]
- "Complexity of classical dynamics of molecular systems. II. Finite
statistical complexity of a water--Na+ system", Journal of Chemical
Physics 117 (2002): 9618--9622
[PDF]
- "Quantifying the Complexity of Chaos in Multibasin Multidimensional
Dynamics of Molecular Systems", Complexity 10:2
(2004): 40--46
[PDF]
- "Dynamic Complexity of Chaotic Transitions in High-Dimensional Classical
Dynamics: Leu-Enkephalin Folding", in Computational Life Sciences
II, pp. 129--140 (Berlin: Springer Verlag, 2006)
[PDF]
- Muntsa Padró and Lluís Padró
- "Applying a Finite Automata Acquisition Algorithm to Named Entity Recognition", in Proceedings of 5th International Workshop on Finite-State Methods and Natural Language Processing [FSMNLP'05] (2005) [PDF]
- "A Named Entity Recognition System Based on a Finite Automata Acquisition Algorithm", Procesamiento del Lenguaje Natural 35 (2005): 319--326
[PDF]
- "Approaching Sequential NLP Tasks with an Automata Acquisition Algorithm",
in Proceedings of International Conference on Recent Advances in NLP [RANLP'05] (2005) [PDF]
- "ME-CSSR: an Extension of CSSR using Maximum Entropy Models", in Proceedings of Finite State Methods for Natural Language Processing (FSMNLP) 2007 [PDF]
- "Studying CSSR Algorithm Applicability on NLP Tasks", Procesamiento del Lenguaje Natural 39 (2007): 89--96 [PDF]
- Joongwoo Brian Park, Jeong Won Lee, Jae-Suk Yang, Hang-Hyun Jo and Hie-Tae
Moon
- "Complexity analysis of the stock market", Physica A
379 (2007): 179--187 = arxiv:physics/0607283
- Asok Ray
- "Symbolic dynamic analysis of complex systems for anomaly
detection", Signal Processing 84 (2004):
1115--1130
- Cosma Rohilla Shalizi, Marcelo F. Camperi and Kristina Lisa Klinkner
- "Discovering Functional Communities in Dynamical Networks", pp. 140--157 in
Edo Airoldi et al. (eds.), Statistical Network Analysis: Models, Issues
and New Directions [proceedings of an ICML 2006 workshop] (New York:
Springer-Verlag, 2007)
= arxiv:q-bio.NC/0609008
- Dowman P. Varn and James P. Crutchfield
- "From Finite to Infinite Range Order via Annealing: The Causal Architecture
of Deformation Faulting in Annealed Close-Packed Crystals", Physics
Letters A 324 (2004): 299--307
= arxiv:cond-mat/0307296
Contact Information
Send e-mail to Cosma
Shalizi or Kristina
Klinkner.
Please write us to let us know if you use CSSR in an application (especially
a paper), compile it on a new platform, modify it, wish to be kept informed of
future developments, or have a bug or other problem to report. However, before
sending bug reports, please read the documentation carefully, and if possible
the paper too, so that you're sure what you're experiencing is a bug.
Since CSSR is released under the Gnu Public License, feel free to write your
own fix for the bug, and tell us about it!
Created 7 August 2003 by CRS