## Statistical Inference for Markov and Hidden Markov Models

*27 Feb 2017 16:30*

I am concerned here with inferring the parameters and/or the structure of the model, not with the estimation of the hidden state in a hidden Markov model with known parameters; that problem falls under filtering.

Parameter inference (what machine learning types would call "learning") given a known, fixed structure. Determining the structure ("discovery"). Order estimation as a particular case of discovery, and of model selection. (Discovery may need its own notebook soon.)

Markov models naturally form exponential families; what classes of partially-observable Markov processes also do?

See also: Markov models; Mixture Models; Statistics; Time Series; Universal Prediction Algorithms

- Recommended (big picture):
- Patrick Billingsley, Statistical Inference for Markov Chains [Foundational.]
- Olivier Cappé Eric Moulines and Tobias Ryden,
Inference in Hidden Markov Models [This is a superb book, treating
all the main statistical problems connected with HMMs with both clarity and
rigor. There is a chapter/appendix which reminds the reader who has forgotten
about Markov chains on general state spaces, but remembers measure-theoretic
probability
*very well*, about their properties. (The idea that the reader of a book on HMMs may not know what the Viterbi algorithm is, but will*of course*recall the Hahn-Jordan decomposition, strikes me as very much a product of the French school of probability theory — from which I have learned much! But it's not so far gone as to make the whole thing an exercise in the "general theory of processes".)] - Andrew Fraser, Hidden Markov Models and Dynamical Systems [Review: The Statistics of Moving Shadows]
- Peter Guttorp, Stochastic Modelling of Scientific Data

- Recommended (close-ups):
- Animashree Anandkumar, Daniel Hsu, Sham M. Kakade, "A Method of Moments for Mixture Models and Hidden Markov Models", arxiv:1203.0683
- David Blackwell and Lambert Koopmans, "On the Identifiability
Problem for Functions of Finite Markov Chains", The Annals of
Mathematical Statistics
**28**(1957): 1011--1015 [An old, but very clear, paper on the problems presented by what we now call hidden Markov models] - Peter Bühlmann and Abraham J. Wyner, "Variable Length Markov
Chains", The Annals of Statistics
**27**(1999): 480--513 [Preprint available as Berkeley statistics department technical report 479] - Olivier Cappé "Online EM Algorithm for Hidden Markov
Models", Journal of Computational and Graphical Statistics
**20**(2011): 728--749, arxiv:0908.2359 - George Cybenko and Valentino Crespi, "Learning Hidden Markov Models Using
Nonnegative Matrix Factorization", IEEE
Transactions on Information Theory
**57**(2011): 3963--3970, arxiv:0809.4086 [Though it contains an error, at least in the preprint version, about the capacities of our CSSR algorithm --- we can get model structures right with much less data than they think, though we presented examples using more data than was strictly needed.] - Randal Douc, Eric Moulines, Jimmy Olsson, and Ramon van Handel, "Consistency of the maximum likelihood estimator for general hidden Markov models",
Annals of Statistics
**39**(2011): 474--513 - P. Dupont, F. Denis and Y. Esposito, "Links between probabilistic
automata and hidden Markov models: probability distributions, learning models
and induction algorithms", Pattern
Recognition
**38**(2005): 1349--1371 - Emily B. Fox, Erik B. Sudderth, Michael I. Jordan, Alan S. Willsky, "A sticky HDP-HMM with application to speaker diarization", Annals
of Applied Statistics
**5**(2011): 1020--1056, arxiv:0905.2592 - Subhashis Ghosal and Yongqiang Tang, "Bayesian Consistency for Markov Processes", Sankhya
**68**(2006): 227--239 - Subhashis Ghosal and Aad van der Vaart, "Convergence Rates of Posterior Distributions for Non-IID
Observations", Annals of Statistics
**35**(2007): 192--223 = arxiv:0708.0491 - J. D. Kalbfleisch and J. F. Lawless, "Least-Squares Estimation of Transition Probabilities from Aggregate Data", The Canadian Journal of Statistics / La Revue Canadienne de Statistique
**12**(1984): 169--182 [JSTOR] - Mladen Kolar, Le Song, Amr Ahmed, Eric P. Xing, "Estimating time-varying networks", Annals of Applied Statistics
**4**(2010): 94--123, arxiv:0812.5087 - Gusztav Morvai and Benjamin Weiss
- "Estimating the Lengths of Memory
Words", IEEE
Transactions on Information Theory
**54**(2008): 3804--3807 - "On estimating the memory for finitarily Markovian processes", Ann. I. H. Poincaré-PR
**43**(2007): 15--30 arxiv:0712.0105 - "Order estimation of Markov chains", IEEE Transactions
on Information Theory
**51**(2005): 1496--1497, arxiv:0711.0472

- "Estimating the Lengths of Memory
Words", IEEE
Transactions on Information Theory
- F. Papangelou, "Large Deviations and the Bayesian Estimation of Higher-Order Markov Transition Functions ", Journal of Applied Probability
**33**(1996): 18--27 [JSTOR] - Yuval Peres and Paul Shields, "Two new Markov order estimators",
math.ST/0506080
[
*Very*nice.] - David Pfau, Nicholas Bartlett and Frank Wood, "Probabilistic
Deterministic Infinite Automata", NIPS 23 (2010) [PDF. Really,
mixtures over
*finite*automata with arbitrarily many states.] - George G. Roussas, Contiguity of Probability Measures: Some Applications in Statistics [Asymptotic theory of approximation, estimation and testing, for discrete-time Markov processes on fairly general state-spaces. Mini-review]
- Christopher C. Strelioff, James P. Crutchfield, Alfred W. Hubler, "Inferring Markov Chains: Bayesian Estimation, Model Comparison, Entropy Rate, and Out-of-class Modeling", math.ST/0703715
- Daniel R. Upper, Theory and Algorithms for Hidden Markov Models and Generalized Hidden Markov Models [Ph.D. thesis, math dept., Berkeley, 1997; online]

- Not
- E. Racca, F. Laio, D. Poggi and L. Ridolfi, "Test to determine the
Markov order of a time
series", Physical
Review E
**75**(2007): 011126 [The test is to linearly regress*x(t+1)*on*x(t)*,*x(t-1)*, etc., out to some finite order, and see how far back you have to go before the regression coefficients are insignificantly different from zero. This is not crazy as a first cut idea, but it's not generally valid, and in fact fails for the logistic map.]

*quite*recommended:

- S. S. Melnyk, O. V. Usatenko, V. A. Yampol'skii and V. A. Golick, "Competition between Two Kinds of Correlations in Literary Texts", physics/0402042 [This has surely got to be one of the ugliest ways of parameterizing a Markov chain I have seen; it's a miracle they don't get probabilities greater than 1, if indeed they don't.]

*Definitely*not recommended:

- Modesty forbids me to recommend:
- CRS, "Dynamics of Bayesian Updating with Dependent Data and Misspecified Models", arxiv:0901.1342
- CRS and Kristina Lisa Klinkner, "Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences", cs.LG/0406011 [CSSR]

- To read:
- Roberto C. Alamino, Nestor Caticha, "Online Learning in Discrete Hidden Markov Models", arxiv:0708.2377
- Armen Allahverdyan, Aram Galstyan, "On Maximum a Posteriori Estimation of Hidden Markov Processes", UAI 2009, arxiv:0906.1980
- Enrique E. Alvarez, "Estimation in Stationary Markov Renewal
Processes, with Application to Earthquake Forecasting in Turkey", Methodology and
Computing in Applied Probability
**7**(2005): 119--130 - Animashree Anandkumar, Kamalika Chaudhuri, Daniel Hsu, Sham M. Kakade, Le Song, Tong Zhang, "Spectral Methods for Learning Multivariate Latent Tree Structure", arxiv:1107.1283
- Sofia Andersson and Tobias Rydén, "Subspace estimation and
prediction methods for hidden Markov
models", Annals
of Statistics
**37**(2009): 4131--4152 - Ana Arribas-Gil, Elisabeth Gassiat and Catherine Matias, "Parameter estimation in pair hidden Markov models", math.ST/0509280
- Yves F. Atchade, "Estimation of Network structures from partially observed Markov random fields", arxiv:1108.2835
- A. R. Baigorri, C. R. Goncalves, P. A. A. Resende, "Markov Chain Order Estimation and Relative Entropy", arxiv:0910.0264
- Alexandre Belloni, Roberto I. Oliveira, "Approximate group context tree: applications to dynamic programming and dynamic choice models", arxiv:1107.0312
- Patrice Bertail and Stéphan Clémençon,
"Edgeworth expansions of suitably normalized sample mean statistics for atomic
Markov chains", Probability Theory and
Related Fields
**130**(2004): 388--414 [I need to learn more about Edgeworth expansions anyway] - P. J. Bickel and Y. Ritov, "Inference in Hidden Markov Models I: Local Asymptotic Normality in the Stationary Case", UCB Statistics Technical Report 383 [link]
- Byron Boots, "Spectral Approaches to Learning Predictive Representations" [Thesis proposal, CMU, 2011]
- Byron Boots, Sajid M. Siddiqi, Geoffrey J. Gordon, "Closing the Learning-Planning Loop with Predictive State Representations", arxiv:0912.2385
- Jose Borges and Mark Levene, "Evaluating Variable Length Markov Chain Models for Analysis of User Web Navigation Sessions", cs.AI/0606115
- J. Borwanker, G. Kallianpur and B. L. S. Prakasa Rao, "The
Bernstein-von Mises Theorem for Markov Processes", The Annals of
Mathematical Statistics
**42**(1971): 1241--1253 [JSTOR] - Carles Bretó, Daihai He, Edward L. Ionides, Aaron A. King, "Time series analysis via mechanistic models", Annals of Applied Statistics
**3**(2009): 319--348, arxiv:0802.0021 - Zhiyi Chi, "Effects of statistical dependence on multiple testing under a hidden Markov model", Annals of Statistics
**39**(2011): 439--473 - Pavel Chigansky, "Maximum Likelihood Estimator for Hidden Markov Models in continuous time",
Statistical Inference for Stochastic Processes
**12**(2009): 139--163, arxiv:0707.0271 - Christos Dimitrakakis, "Contex models on sequences of covers", arxiv:1005.2263
- C. C. Y. Dorea and L. C. Zhao, "Nonparametric Density Estimation
in Hidden Markov Models", Statistical Inference for Stochastic
Processes
**5**(2002): 55--64 - Michael DelRose, Christian Wagner, Philip Frederick, "Evidence Feed Forward Hidden Markov Model: A New Type of Hidden Markov Model", arxiv:1102.0899 [Looks like a "to be shot after a fair trial" paper from the abstract, but then, it deserves a fair trial]
- Randal Douc, "Non singularity of the asymptotic Fisher information matrix in hidden Markov models", math.ST/0511631
- Randal Douc and Eric Moulines, "Asymptotic properties of the maximum likelihood estimation in misspecified Hidden Markov models", arxiv:1110.0356
- Farzad Eskandari and Mohammad R. Meshkani, "Empirical Bayes
analysis of log-linear models for a generalized finite stationary Markov
chain", Metrika
**59**(2004): 173--191 [abstract] - Florence Forbes and Nathalie Peyrard, "Hidden Markov Random Field
Model Selection Criteria Based on Mean Field-Like Approximations", IEEE Transactions on
Pattern Analysis and Machine Intelligence
**25**(2003): 1089--1101 [PostScript preprint] - Scott D. Foster and Mark V. Bravington, "Graphical Diagnostics for Markov Models for Categorical Data", Journal of Computational and Graphical Statistics forthcoming (2011)
- Halina Frydman and Peter Lakner, "Maximum likelihood estimation of
hidden Markov
processes", Annals of
Applied Probability
**13**(2003): 1296--1312 - Cheng-Der Fuh, "Asymptotic operating characteristics of an optimal
change point detection in hidden Markov models", Annals of
Statistics
**32**(2004): 2305--2339 = math.ST/0503682 - Cheng-Der Fuh and Inchi Hu, "Estimation in hidden Markov models via efficient importance sampling", Bernoulli
**13**(2007): 492--513, arxiv:0708.4152 - Antonio Galves, Charlotte Galves, Nancy L. Garcia, Florencia Leonardi, "Context tree selection and linguistic rhythm retrieval from written texts", arxiv:0902.3619
- Antonio Galves, Aurélien Garivier, Elisabeth Gassiat, "Joint estimation of intersecting context tree models", arxiv:1102.0673
- Antonio Galves, Florencia Leonardi, "Exponential inequalities for empirical unbounded context trees", arxiv:0710.5900
- Steven T. Garren and Richard L. Smith, "Estimating the second largest eigenvalue of a Markov transition matrix", Bernoulli
**6**(2000): 215--242 **Hugo Harari-Kermadec, "Regenerative block empirical likelihood for Markov chains", arxiv:1102.3107****Daniel Hsu, Sham M. Kakade, Tong Zhang, "A Spectral Algorithm for Learning Hidden Markov Models", arxiv:0811.4413****Masashi Inoue and Naonori Ueda, "Exploitation of Unlabeled Sequences in Hidden Markov Models", IEEE Transactions on Pattern Analysis and Machine Intelligence****25**(2003): 1570--1581**H. Ito, S.-I. Amari and K. Kobayashi, "Identifiability of Hidden Markov Information Sources and Their Minimum Degrees of Freedom", IEEE Transactions on Information Theory****38**(1992): 324--333**Dezhe Z. Jin, Alexay A. Kozhevnikov, "A compact statistical model of the song syntax in Bengalese finch", arxiv:1011.2998****Hans R. Künsch, "State Space and Hidden Markov Models", pp. 109--173 in Ole E. Barndorff-Nielsen, David R. Cox and Claudia Klüppelberg (eds.), Complex Stochastic Systems****J. Lember, A. Koloydenko, "Adjusted Viterbi training for hidden Markov models", arxiv:0709.2317****Florencia Leonardi, "Some upper bounds for the rate of convergence of penalized likelihood context tree estimators", arxiv:0701810****E. Locherbach, "Likelihood Ratio Processes for Markovian Particle Systems with Killing and Jumps", Statistical Inference for Stochastic Processes****5**(2002): 153--177**Claudio Macci, "Large Deviations for Empirical Estimators of the Stationary Distribution of a Semi-Markov Process with Finite State Space", Communications in Statistics: Theory and Methods****37**(2008): 3077--3089**Philipp Metzner, Frank Noe and Christof Schutte, "Estimating the sampling error: Distribution of transition matrices and functions of transition matrices for given trajectory data", Physical Review E****80**(2009): 021106 [I presume they have a good reason for not just using the delta method, and/or bootstrapping, but I'll have to read it to see what that is]**G. Morvai and B. Weiss, "Order Estimation of Markov Chains", IEEE Transactions on Information Theory****51**(2005): 1496--1497**Elchanan Mossel, Sébastien Roch, "Learning nonsingular phylogenies and hidden Markov models", Annals of Applied Probability****16**(2006): 583--614, arxiv:cs.LG/0502076**Michael H. Neumann, Efstathios Paparoditis, "Goodness-of-fit tests for Markovian time series models: Central limit theory and bootstrap approximations", Bernoulli****14**(2008): 14--46, arxiv:0803.0835**Enza Orlandi, Eva Loecherbach, "On the neighborhood radius estimation in Variable-neighborhood Markov Random Fields", arxiv:1002.4850****Adam Paszkiewicz, "When transition count for a Markov chains is a complete sufficient statistic", Statistics and Probability Letters****76**(2006): 757--763 [When the initial and final states are fixed, for example]**Spiridon Penev, Hanxiang Peng, Atnon Schick and Wolfgang Wefelmeyer, "Efficient estimators for functionals of Markov chains with parametric marginals", Statistics and Probability Letters****66**(2004): 335--345**Ricardo Ríos, Luis-Angel Rodríguez, "Penalized estimate of the number of states in Gaussian linear AR with Markov regime", Electronic Journal of Statistics****2**(2008): 1111--1128, arxiv:0807.2726**Amr Sadek and Nikolaos Limnios, "Nonparametric estimation of reliability and survival function for continuous-time finite Markov processes", Journal of Statistical Planning and Inference****133**(2005): 1--21**Manuel S. Santos, "Consistency properties of a simulation-based estimator for dynamic processes", Annals of Applied Probability****20**(2010): 196--213**Anton Schick and Wolfgang Wefelmeyer, "Estimating Joint Distributions of Markov Chains", Statistical Inference for Stochastic Processes****5**(2002): 1--22**Peter Sunehag, Marcus Hutter, "Consistency of Feature Markov Processes", arxiv:1007.2075****V. B. Tadic, "Analyticity, Convergence, and Convergence Rate of Recursive Maximum-Likelihood Estimation in Hidden Markov Models", IEEE Transactions on Information Theory****56**(2010): 6406--6432, arxiv:0904.4264**Iuliana Teodorescu, "Maximum Likelihood Estimation for Markov Chains", arxiv:0905.4131****M. J. van der Heyden et al., "Testing the Order of Discrete Markov Chains Using Surrogate Data", Physica D****117**(1998): 299--313**Ramon van Hanel, "On the minimal penalty for Markov order estimation", Probability Theory and Related Fields****150**(2011): 709--738, arxiv:0908.3666**Elodie Vernet, "Posterior consistency for nonparametric Hidden Markov Models with finite state space", arxiv:1311.3092****Martin J. Wainwright, "Inconsistent parameter estimation in Markov random fields: Benefits in the computation-limited setting", cs.LG/0602092****Christian H. Weiss, "Rule generation for categorical time series with Markov assumptions", Statistics and Computing****21**(2011): 1--16 [VLMMs]**James Zhao, "Hidden Markov Models with Multiple Observation Processes", arxiv:1010.1042****L. C. Zhao, C. C. Y. Dorea and C. R. Gonçalves, "On Determination of the Order of a Markov Chain", Statistical Inference for Stochastic Processes****4**(2001): 273--282**Ming-Jie Zhao and Herbert Jaeger, "Making the Error-Controlling Algorithm of Observable Operator Models Constructive", Neural Computation****21**(2009): 3460--3486