Change-Point Problems

Last update: 07 Jul 2025 13:49
First version: 23 December 2011, major expansion 16 March 2020

Suppose you have a time series which has some (stochastic) property you're interested in, say its expected value or its variance. You think this is usually constant, but that if it does change, it does so abruptly. You would like to know if and when it changes, and perhaps to localize the time when it did. You now have a change-point problem.

In an "offline" setting, where you've collected all you're data and want to analyze it in retrospect, there are fairly straightforward approaches, especially if you are willing to invoke parametric models. We imagine that we've got data \( X_1, X_2, \ldots X_T \equiv X_{1:T} \), and that if there's no change point then these follow some known probability distribution \( p(x_{1:T}; \theta_0) \). If on the other hand there is a break after time \( t \), then the distribution is \( p(x_{1:t};\theta_0) p(x_{t+1:T};\theta_1) \). (If you really want to condition the after-break stuff on the pre-break stuff go ahead.) The Neyman-Pearson lemma tells us that the optimal test is the likelihood ratio test, so we should locate the break-point at \( t \) if the log likelihood ratio \[ \log{\frac{p(x_{1:t};\theta_0) p(x_{t+1:T};\theta_1)}{p(x_{1:T};\theta_0)}} \] is sufficiently large. If neither the pre-change nor post-change parameters are exactly known, we can often still get good results by substituting in their maximum likelihood estimators (or other consistent, efficient estimators), say \[ \log{\frac{p(x_{1:t};\hat{\theta}_0) p(x_{t+1:T};\hat{\theta}_1)}{p(x_{1:T};\tilde{\theta}_0)}} \] If the basic parametric family has \( d \) continuous parameters, then the version with the change point at time \( t \) has \( 2d \) continuous parameters, and we'd typically get a \( \chi^2_{2d-d} = \chi^2_{d} \) distribution for the distribution of the log likelihood ratio under the null hypothesis of no change. (Notice that the estimate of \( \theta_0 \) should be different under the null of no change and the alternative of a change at \( t \), so I've given the different estimators different accent marks.)

We usually also don't just care about one particular possible change point \( t \), but rather want to consider a range as possibilities. This will often be handled by maximizing the likelihood over different possible values of \( t \). Since that's not a continuous parameter, the usual \( \chi^2 \) theory doesn't apply.

Sylvain Arlot and Alain Celisse, "Segmentation of the mean of heteroscedastic data via cross-validation", Statistics and Computing 21 (2011): 613--632, arxiv:0902.3977 [MATLAB code]
Emily B. Fox, Erik B. Sudderth, Michael I. Jordan, Alan S. Willsky, "A sticky HDP-HMM with application to speaker diarization", Annals of Applied Statistics 5 (2011): 1020--1056, arxiv:0905.2592
Daniil Ryabko and Boris Ryabko, "Testing Statistical Hypotheses About Ergodic Processes", arxiv:0804.0510 [Appears to be the same as their "Nonparametric Statistical Inference for Ergodic Processes", IEEE Transactions on Information Theory 56 (2010): 1430--1435
Wenguang Sun, T. Tony Cai, "Large-scale multiple testing under dependence", Journal of the Royal Statistical Society B 71 (2008): 393--424
Albert Vexler, "Martingale Type Statistics Applied to Change Point Detection", Communications in Statistics - Theory and Methods 37 (2008): 1207--1224

G. A. Barnard, "Control Charts and Stochastic Processes", Journal of the Royal Statistical Society B 21 (1959): 239--271 [JSTOR]
P. J. Harrison and C. F. Stevens, "A Bayesian Approach to Short-term Forecasting", Operational Research Quarterly 22 (1971): 341-–362 [A reasonable approach, based on Kalman smoothing, with a wild application unintended by the authors]
E. S. Page, "Continuous Inspection Schemes", Biometrika 41 (1954): 100--115 [JSTOR. Origin of the famous "cumulative sum" ("CUSUM") procedure for finding breaks.]
A. Wald and J. Wolfowitz, "Optimum Character of the Sequential Probability Ratio Test", Annals of Mathematical Statistics 19 (1948): 326--339

Alexander Aue and Lajos Horváth, "Structural breaks in time series", Journal of Time Series Analysis 34 (2013): 1--16
Boris Brodsky and Boris Darkhovsky, "Sequential change-point detection for mixing random sequences under composite hypotheses", Statistical Inference for Stochastic Processes 11 (2008): 35--54
S. Camargo, S. M. Duarte Quieros and C. Anteneodo, "Nonparametric segmentation of nonstationary time series", Physical Review E 84 (2011): 046702
Cheng-Der Fuh
- "SPRT and CUSUM in hidden Markov models", Annals of Statistics 31 (2003): 942--977
- "Asymptotic operating characteristics of an optimal change point detection in hidden Markov models", Annals of Statistics 32 (2004): 2305--2339 = math.ST/0503682
Noah D. Gade, Jordan Rodu, "Change Point Detection with Conceptors", arxiv:2308.06213
Samir Ben Hariz, Jonathan J. Wylie, Qiang Zhang, "Optimal rate of convergence for nonparametric change-point estimators for nonstationary sequences", Annals of Statistics 35 (2007): 1802--1826, arxiv:0710.4217
Heping He and Thomas A. Severini, "Asymptotic properties of maximum likelihood estimators in models with multiple change points", Bernoulli 16 (2010): 759--779, arxiv:1102.5224
C. T. Jose, B. Ismail, S. Jayasekhar, "Trend, Growth Rate, and Change Point Analysis: A Data Driven Approach", Communications in Statistics: Simulation and Computation 37 (2008): 498--506
Azadeh Khaleghi, Daniil Ryabko, "Multiple Change-Point Estimation in Stationary Ergodic Time-Series", arxiv:1203.1515
Shiqing Ling, "Testing for change points in time series models and limiting theorems for NED sequences", Annals of Statistics 35 (2007): 1213--1237, arxiv:0708.2369
George V. Moustakides, "Sequential change detection revisited", arxiv:0804.0741 = Annals of Statistics 36 (2008): 787--807
Chun Yip Yau and Richard A. Davis, "Likelihood inference for discriminating between long-memory and change-point models", Journal of Time Series Analysis 33 (2012): 649--664