Statistical Inference for Markov and Hidden Markov Models
Last update: 08 Dec 2024 12:25First version: 28 September 2007
I am concerned here with inferring the parameters and/or the structure of the model, not with the estimation of the hidden state in a hidden Markov model with known parameters; that problem falls under filtering.
Parameter inference (what machine learning types would call "learning") given a known, fixed structure. Determining the structure ("discovery"). Order estimation as a particular case of discovery, and of model selection. (Discovery may need its own notebook soon.)
Markov models naturally form exponential families; what classes of partially-observable Markov processes also do?
- See also:
- Inference for Stochastic Differential Equations (strictly speaking a sub-topic, but one with enough special features to get separate treatment)
- Markov models
- Mixture Models [An HMM with a finite number of hidden states is like a mixture model with serial dependence between observations...]
- Prediction Processes; Markovian (and Conceivably Causal) Representations of Stochastic Processes
- Statistics
- Time Series
- Universal Prediction Algorithms
- Recommended (big picture):
- Patrick Billingsley, Statistical Inference for Markov Chains [Foundational.]
- Olivier Cappé Eric Moulines and Tobias Ryden, Inference in Hidden Markov Models [This is a superb book, treating all the main statistical problems connected with HMMs with both clarity and rigor. There is a chapter/appendix which reminds the reader who has forgotten about Markov chains on general state spaces, but remembers measure-theoretic probability very well, about their properties. (The idea that the reader of a book on HMMs may not know what the Viterbi algorithm is, but will of course recall the Hahn-Jordan decomposition, strikes me as very much a product of the French school of probability theory --- from which I have learned much! But it's not so far gone as to make the whole thing an exercise in the "general theory of processes".)]
- Andrew Fraser, Hidden Markov Models and Dynamical Systems [Review: The Statistics of Moving Shadows]
- Peter Guttorp, Stochastic Modelling of Scientific Data
- Kevin McGoff, Sayan Mukherjee, and Natesh Pillai, "Statistical inference for dynamical systems: A review", Statistics Surveys 9 (2015): 209--252, arxiv:1204.6265
- Recommended (close-ups):
- Animashree Anandkumar, Daniel Hsu, Sham M. Kakade, "A Method of Moments for Mixture Models and Hidden Markov Models", arxiv:1203.0683
- Peter J. Bickel and Ya'acov Ritov, "Inference in hidden Markov models I: Local asymptotic normality in the stationary case", Bernoulli 2 (1996): 199--228 [Not sure if Part II ever appeared...]
- David Blackwell and Lambert Koopmans, "On the Identifiability Problem for Functions of Finite Markov Chains", The Annals of Mathematical Statistics 28 (1957): 1011--1015 [An old, but very clear, paper on the problems presented by what we now call hidden Markov models]
- Peter Bühlmann and Abraham J. Wyner, "Variable Length Markov Chains", The Annals of Statistics 27 (1999): 480--513 [Preprint available as Berkeley statistics department technical report 479]
- Olivier Cappé "Online EM Algorithm for Hidden Markov Models", Journal of Computational and Graphical Statistics 20 (2011): 728--749, arxiv:0908.2359
- George Cybenko and Valentino Crespi, "Learning Hidden Markov Models Using Nonnegative Matrix Factorization", IEEE Transactions on Information Theory 57 (2011): 3963--3970, arxiv:0809.4086 [Though it contains an error, at least in the preprint version, about the capacities of our CSSR algorithm --- we can get model structures right with much less data than they think, though we presented examples using more data than was strictly needed.]
- Randal Douc, Eric Moulines, Jimmy Olsson, and Ramon van Handel, "Consistency of the maximum likelihood estimator for general hidden Markov models", Annals of Statistics 39 (2011): 474--513
- P. Dupont, F. Denis and Y. Esposito, "Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms", Pattern Recognition 38 (2005): 1349--1371
- Julian Feldman and Joe F. Hanna", "The Structure of Responses to a Sequence of Binary Events", Journal of Mathematical Psychology 3 (1966): 371--387
- J. D. Foulkes, "A Class of Machines which Determine the Statistical Structure of a Sequence of Characters", pp. 66-73 of vol. 4 of the Western Electronics [Wescon] Convention Record, 1959 (Institute of Radio Engineers) [PDF]
- Emily B. Fox, Erik B. Sudderth, Michael I. Jordan, Alan S. Willsky, "A sticky HDP-HMM with application to speaker diarization", Annals of Applied Statistics 5 (2011): 1020--1056, arxiv:0905.2592
- Sandro Gallo and Florencia Leonardi, "Nonparametric statistical inference for the context tree of a stationary ergodic process", Electronic Journal of Statistics 9 (2015): 2076--2098, arxiv:1411.7650
- Subhashis Ghosal and Yongqiang Tang, "Bayesian Consistency for Markov Processes", Sankhya 68 (2006): 227--239
- Subhashis Ghosal and Aad van der Vaart, "Convergence Rates of Posterior Distributions for Non-IID Observations", Annals of Statistics 35 (2007): 192--223, arxiv:0708.0491
- Christopher C. Heyde, Quasi-Likelihood and Its Applications: A General Approach to Optimal Parameter Estimation
- J. D. Kalbfleisch and J. F. Lawless, "Least-Squares Estimation of Transition Probabilities from Aggregate Data", The Canadian Journal of Statistics / La Revue Canadienne de Statistique 12 (1984): 169--182 [JSTOR]
- Mladen Kolar, Le Song, Amr Ahmed, Eric P. Xing, "Estimating time-varying networks", Annals of Applied Statistics 4 (2010): 94--123, arxiv:0812.5087
- Kevin McGoff, Andrew B. Nobel, "Empirical risk minimization and complexity of dynamical models", Annals of Statistics 48 (2020): 2031--2054, arxiv:1611.06173
- Gusztav Morvai and Benjamin Weiss
- "Estimating the Lengths of Memory Words", IEEE Transactions on Information Theory 54 (2008): 3804--3807
- "On estimating the memory for finitarily Markovian processes", Ann. I. H. Poincaré-PR 43 (2007): 15--30 arxiv:0712.0105
- "Order estimation of Markov chains", IEEE Transactions on Information Theory 51 (2005): 1496--1497, arxiv:0711.0472
- Ioannis Papageorgiou, Ioannis Kontoyiannis, "Context-tree weighting for real-valued time series: Bayesian inference with hierarchical mixture models", arxiv:2106.03023
- F. Papangelou, "Large Deviations and the Bayesian Estimation of Higher-Order Markov Transition Functions ", Journal of Applied Probability 33 (1996): 18--27 [JSTOR]
- Yuval Peres and Paul Shields, "Two new Markov order estimators", math.ST/0506080
- David Pfau, Nicholas Bartlett and Frank Wood, "Probabilistic Deterministic Infinite Automata", NIPS 23 (2010) [PDF. Really, mixtures over finite automata with arbitrarily many states.]
- George G. Roussas, Contiguity of Probability Measures: Some Applications in Statistics [Asymptotic theory of approximation, estimation and testing, for discrete-time Markov processes on fairly general state-spaces. Mini-review]
- Christopher C. Strelioff, James P. Crutchfield, Alfred W. Hubler, "Inferring Markov Chains: Bayesian Estimation, Model Comparison, Entropy Rate, and Out-of-class Modeling", math.ST/0703715
- Daniel R. Upper, Theory and Algorithms for Hidden Markov Models and Generalized Hidden Markov Models [Ph.D. thesis, math dept., Berkeley, 1997; online]
- Not quite recommended:
- E. Racca, F. Laio, D. Poggi and L. Ridolfi, "Test to determine the Markov order of a time series", Physical Review E 75 (2007): 011126 [The test is to linearly regress x(t+1) on x(t), x(t-1), etc., out to some finite order, and see how far back you have to go before the regression coefficients are insignificantly different from zero. This is not crazy as a first cut idea, but it's not generally valid, and in fact fails for the logistic map.]
- Definitely not recommended:
- S. S. Melnyk, O. V. Usatenko, V. A. Yampol'skii and V. A. Golick, "Competition between Two Kinds of Correlations in Literary Texts", physics/0402042 [This has surely got to be one of the ugliest ways of parameterizing a Markov chain I have seen; it's a miracle they don't get probabilities greater than 1, if indeed they don't.]
- Modesty forbids me to recommend:
- CRS, "Dynamics of Bayesian Updating with Dependent Data and Misspecified Models", arxiv:0901.1342
- CRS and Kristina Lisa Klinkner, "Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences", cs.LG/0406011 [CSSR]
- To read:
- Roberto C. Alamino, Nestor Caticha, "Online Learning in Discrete Hidden Markov Models", arxiv:0708.2377
- Armen Allahverdyan, Aram Galstyan, "On Maximum a Posteriori Estimation of Hidden Markov Processes", UAI 2009, arxiv:0906.1980
- Enrique E. Alvarez, "Estimation in Stationary Markov Renewal Processes, with Application to Earthquake Forecasting in Turkey", Methodology and Computing in Applied Probability 7 (2005): 119--130
- Animashree Anandkumar, Kamalika Chaudhuri, Daniel Hsu, Sham M. Kakade, Le Song, Tong Zhang, "Spectral Methods for Learning Multivariate Latent Tree Structure", arxiv:1107.1283
- Sofia Andersson and Tobias Rydén, "Subspace estimation and prediction methods for hidden Markov models", Annals of Statistics 37 (2009): 4131--4152
- Ana Arribas-Gil, Elisabeth Gassiat and Catherine Matias, "Parameter estimation in pair hidden Markov models", math.ST/0509280
- Yves F. Atchade, "Estimation of Network structures from partially observed Markov random fields", arxiv:1108.2835
- A. R. Baigorri, C. R. Goncalves, P. A. A. Resende, "Markov Chain Order Estimation and Relative Entropy", arxiv:0910.0264
- Alexandre Belloni, Roberto I. Oliveira, "Approximate group context tree: applications to dynamic programming and dynamic choice models", arxiv:1107.0312
- Patrice Bertail and Stéphan Clémençon, "Edgeworth expansions of suitably normalized sample mean statistics for atomic Markov chains", Probability Theory and Related Fields 130 (2004): 388--414 [I need to learn more about Edgeworth expansions anyway]
- Byron Boots, "Spectral Approaches to Learning Predictive Representations" [Thesis proposal, CMU, 2011]
- Byron Boots, Sajid M. Siddiqi, Geoffrey J. Gordon, "Closing the Learning-Planning Loop with Predictive State Representations", arxiv:0912.2385
- Jose Borges and Mark Levene, "Evaluating Variable Length Markov Chain Models for Analysis of User Web Navigation Sessions", cs.AI/0606115
- J. Borwanker, G. Kallianpur and B. L. S. Prakasa Rao, "The Bernstein-von Mises Theorem for Markov Processes", The Annals of Mathematical Statistics 42 (1971): 1241--1253 [JSTOR]
- Carles Bretó, Daihai He, Edward L. Ionides, Aaron A. King, "Time series analysis via mechanistic models", Annals of Applied Statistics 3 (2009): 319--348, arxiv:0802.0021
- Prafulla Chandra, Andrew Thangaraj, Nived Rajaraman, "How good is Good-Turing for Markov samples?", arxiv:2102.01938
- Zhiyi Chi, "Effects of statistical dependence on multiple testing under a hidden Markov model", Annals of Statistics 39 (2011): 439--473
- Pavel Chigansky, "Maximum Likelihood Estimator for Hidden Markov Models in continuous time", Statistical Inference for Stochastic Processes 12 (2009): 139--163, arxiv:0707.0271
- Antonio Anastasio Bruto da Costa, Pallab Dasgupta, "Learning Temporal Causal Sequence Relationships from Real-Time Time-Series", arxiv:1905.12262
- Christos Dimitrakakis, "Contex models on sequences of covers", arxiv:1005.2263
- C. C. Y. Dorea and L. C. Zhao, "Nonparametric Density Estimation in Hidden Markov Models", Statistical Inference for Stochastic Processes 5 (2002): 55--64
- Michael DelRose, Christian Wagner, Philip Frederick, "Evidence Feed Forward Hidden Markov Model: A New Type of Hidden Markov Model", arxiv:1102.0899 [Looks like a "to be shot after a fair trial" paper from the abstract, but then, it deserves a fair trial]
- Randal Douc, "Non singularity of the asymptotic Fisher information matrix in hidden Markov models", math.ST/0511631
- Randal Douc and Eric Moulines, "Asymptotic properties of the maximum likelihood estimation in misspecified Hidden Markov models", arxiv:1110.0356
- Thierry Dumont, "Context Tree Estimation in Variable Length Hidden Markov Models", arxiv:1109.0392
- Farzad Eskandari and Mohammad R. Meshkani, "Empirical Bayes analysis of log-linear models for a generalized finite stationary Markov chain", Metrika 59 (2004): 173--191 [abstract]
- Florence Forbes and Nathalie Peyrard, "Hidden Markov Random Field Model Selection Criteria Based on Mean Field-Like Approximations", IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (2003): 1089--1101 [PostScript preprint]
- Scott D. Foster and Mark V. Bravington, "Graphical Diagnostics for Markov Models for Categorical Data", Journal of Computational and Graphical Statistics forthcoming (2011)
- Halina Frydman and Peter Lakner, "Maximum likelihood estimation of hidden Markov processes", Annals of Applied Probability 13 (2003): 1296--1312
- Cheng-Der Fuh, "Asymptotic operating characteristics of an optimal change point detection in hidden Markov models", Annals of Statistics 32 (2004): 2305--2339 = math.ST/0503682
- Cheng-Der Fuh and Inchi Hu, "Estimation in hidden Markov models via efficient importance sampling", Bernoulli 13 (2007): 492--513, arxiv:0708.4152
- Antonio Galves, Charlotte Galves, Nancy L. Garcia, Florencia Leonardi, "Context tree selection and linguistic rhythm retrieval from written texts", arxiv:0902.3619
- Antonio Galves, Aurélien Garivier, Elisabeth Gassiat, "Joint estimation of intersecting context tree models", arxiv:1102.0673
- Antonio Galves, Florencia Leonardi, "Exponential inequalities for empirical unbounded context trees", arxiv:0710.5900
- Nancy L. Garcia, Lucas Moreira, "Stochastically Perturbed Chains of Variable Memory", arxiv:1305.5747
- Steven T. Garren and Richard L. Smith, "Estimating the second largest eigenvalue of a Markov transition matrix", Bernoulli 6 (2000): 215--242
- Hugo Harari-Kermadec, "Regenerative block empirical likelihood for Markov chains", arxiv:1102.3107
- Daniel Hsu, Sham M. Kakade, Tong Zhang, "A Spectral Algorithm for Learning Hidden Markov Models", arxiv:0811.4413
- Masashi Inoue and Naonori Ueda, "Exploitation of Unlabeled Sequences in Hidden Markov Models", IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (2003): 1570--1581
- H. Ito, S.-I. Amari and K. Kobayashi, "Identifiability of Hidden Markov Information Sources and Their Minimum Degrees of Freedom", IEEE Transactions on Information Theory 38 (1992): 324--333
- Dezhe Z. Jin, Alexay A. Kozhevnikov, "A compact statistical model of the song syntax in Bengalese finch", arxiv:1011.2998
- Hans R. Künsch, "State Space and Hidden Markov Models", pp. 109--173 in Ole E. Barndorff-Nielsen, David R. Cox and Claudia Klüppelberg (eds.), Complex Stochastic Systems
- J. Lember, A. Koloydenko, "Adjusted Viterbi training for hidden Markov models", arxiv:0709.2317
- Florencia Leonardi, "Some upper bounds for the rate of convergence of penalized likelihood context tree estimators", arxiv:0701810
- E. Locherbach, "Likelihood Ratio Processes for Markovian Particle Systems with Killing and Jumps", Statistical Inference for Stochastic Processes 5 (2002): 153--177
- Claudio Macci, "Large Deviations for Empirical Estimators of the Stationary Distribution of a Semi-Markov Process with Finite State Space", Communications in Statistics: Theory and Methods 37 (2008): 3077--3089
- A. Martin, Gadiel Seroussi, and Marcelo J. Weinberger, "Type Classes of Context Trees", IEEE Transactions on Information Theory 58 (2012): 4077--4093
- Kevin McGoff, Sayan Mukherjee, Andrew Nobel, "Gibbs posterior convergence and the thermodynamic formalism", arxiv:1901.08641
- Kevin McGoff, Sayan Mukherjee, Andrew Nobel, Natesh Pillai, "Consistency of maximum likelihood estimation for some dynamical systems", Annals of Statistics 43 (2015): 1--29, arxiv:1306.5603
- Kevin McGoff, Andrew B. Nobel, "Variational analysis of inference from dynamical systems", arxiv:1601.05033
- Philipp Metzner, Frank Noe and Christof Schutte, "Estimating the sampling error: Distribution of transition matrices and functions of transition matrices for given trajectory data", Physical Review E 80 (2009): 021106 [I presume they have a good reason for not just using the delta method, and/or bootstrapping, but I'll have to read it to see what that is]
- Elchanan Mossel, Sébastien Roch, "Learning nonsingular phylogenies and hidden Markov models", Annals of Applied Probability 16 (2006): 583--614, arxiv:cs.LG/0502076
- Michael H. Neumann, Efstathios Paparoditis, "Goodness-of-fit tests for Markovian time series models: Central limit theory and bootstrap approximations", Bernoulli 14 (2008): 14--46, arxiv:0803.0835
- Alexander O'Neill, Marcus Hutter, Wen Shao, Peter Sunehag, "Adaptive Context Tree Weighting", arxiv:1201.2056
- Roberto Imbuzeiro Oliveira, "Stochastic processes with random contexts: a characterization, and adaptive estimators for the transition probabilities", arxiv:1309.2819
- Enza Orlandi, Eva Loecherbach, "On the neighborhood radius estimation in Variable-neighborhood Markov Random Fields", arxiv:1002.4850
- Adam Paszkiewicz, "When transition count for a Markov chains is a complete sufficient statistic", Statistics and Probability Letters 76 (2006): 757--763 [When the initial and final states are fixed, for example]
- Spiridon Penev, Hanxiang Peng, Atnon Schick and Wolfgang Wefelmeyer, "Efficient estimators for functionals of Markov chains with parametric marginals", Statistics and Probability Letters 66 (2004): 335--345
- Ricardo Ríos, Luis-Angel Rodríguez, "Penalized estimate of the number of states in Gaussian linear AR with Markov regime", Electronic Journal of Statistics 2 (2008): 1111--1128, arxiv:0807.2726
- Amr Sadek and Nikolaos Limnios, "Nonparametric estimation of reliability and survival function for continuous-time finite Markov processes", Journal of Statistical Planning and Inference 133 (2005): 1--21
- Manuel S. Santos, "Consistency properties of a simulation-based estimator for dynamic processes", Annals of Applied Probability 20 (2010): 196--213
- Anton Schick and Wolfgang Wefelmeyer, "Estimating Joint Distributions of Markov Chains", Statistical Inference for Stochastic Processes 5 (2002): 1--22
- Peter Sunehag, Marcus Hutter, "Consistency of Feature Markov Processes", arxiv:1007.2075
- V. B. Tadic, "Analyticity, Convergence, and Convergence Rate of Recursive Maximum-Likelihood Estimation in Hidden Markov Models", IEEE Transactions on Information Theory 56 (2010): 6406--6432, arxiv:0904.4264
- Zsolt Talata and Tyrone E. Duncan, "BIC Context Tree Estimation for Stationary Ergodic Processes", IEEE Transactions on Information Theory (2011): 3877--3886
- Iuliana Teodorescu, "Maximum Likelihood Estimation for Markov Chains", arxiv:0905.4131
- M. J. van der Heyden et al., "Testing the Order of Discrete Markov Chains Using Surrogate Data", Physica D 117 (1998): 299--313
- Ramon van Hanel, "On the minimal penalty for Markov order estimation", Probability Theory and Related Fields 150 (2011): 709--738, arxiv:0908.3666
- Joel Veness, Kee Siong Ng, Marcus Hutter, Michael Bowling, "Context Tree Switching", arxiv:1111.3182
- Elodie Vernet, "Posterior consistency for nonparametric Hidden Markov Models with finite state space", arxiv:1311.3092
- Martin J. Wainwright, "Inconsistent parameter estimation in Markov random fields: Benefits in the computation-limited setting", cs.LG/0602092
- Christian H. Weiss, "Rule generation for categorical time series with Markov assumptions", Statistics and Computing 21 (2011): 1--16 [VLMMs]
- James Zhao, "Hidden Markov Models with Multiple Observation Processes", arxiv:1010.1042
- L. C. Zhao, C. C. Y. Dorea and C. R. Gonçalves, "On Determination of the Order of a Markov Chain", Statistical Inference for Stochastic Processes 4 (2001): 273--282
- Ming-Jie Zhao and Herbert Jaeger, "Making the Error-Controlling Algorithm of Observable Operator Models Constructive", Neural Computation 21 (2009): 3460--3486