Change-Point Problems
08 Dec 2023 11:00
Suppose you have a time series which has some (stochastic) property you're interested in, say its expected value or its variance. You think this is usually constant, but that if it does change, it does so abruptly. You would like to know if and when it changes, and perhaps to localize the time when it did. You now have a change-point problem.
In an "offline" setting, where you've collected all you're data and want to analyze it in retrospect, there are fairly straightforward approaches, especially if you are willing to invoke parametric models. We imagine that we've got data \( X_1, X_2, \ldots X_T \equiv X_{1:T} \), and that if there's no change point then these follow some known probability distribution \( p(x_{1:T}; \theta_0) \). If on the other hand there is a break after time \( t \), then the distribution is \( p(x_{1:t};\theta_0) p(x_{t+1:T};\theta_1) \). (If you really want to condition the after-break stuff on the pre-break stuff go ahead.) The Neyman-Pearson lemma tells us that the optimal test is the likelihood ratio test, so we should locate the break-point at \( t \) if the log likelihood ratio \[ \log{\frac{p(x_{1:t};\theta_0) p(x_{t+1:T};\theta_1)}{p(x_{1:T};\theta_0)}} \] is sufficiently large. If neither the pre-change nor post-change parameters are exactly known, we can often still get good results by substituting in their maximum likelihood estimators (or other consistent, efficient estimators), say \[ \log{\frac{p(x_{1:t};\hat{\theta}_0) p(x_{t+1:T};\hat{\theta}_1)}{p(x_{1:T};\tilde{\theta}_0)}} \] If the basic parametric family has \( d \) continuous parameters, then the version with the change point at time \( t \) has \( 2d \) continuous parameters, and we'd typically get a \( \chi^2_{2d-d} = \chi^2_{d} \) distribution for the distribution of the log likelihood ratio under the null hypothesis of no change. (Notice that the estimate of \( \theta_0 \) should be different under the null of no change and the alternative of a change at \( t \), so I've given the different estimators different accent marks.)
We usually also don't just care about one particular possible change point \( t \), but rather want to consider a range as possibilities. This will often be handled by maximizing the likelihood over different possible values of \( t \). Since that's not a continuous parameter, the usual \( \chi^2 \) theory doesn't apply.
- Recommended:
- Sylvain Arlot and Alain Celisse, "Segmentation of the mean of heteroscedastic data via cross-validation", Statistics and Computing 21 (2011): 613--632, arxiv:0902.3977 [MATLAB code]
- Emily B. Fox, Erik B. Sudderth, Michael I. Jordan, Alan S. Willsky, "A sticky HDP-HMM with application to speaker diarization", Annals of Applied Statistics 5 (2011): 1020--1056, arxiv:0905.2592
- Daniil Ryabko and Boris Ryabko, "Testing Statistical Hypotheses About Ergodic Processes", arxiv:0804.0510 [Appears to be the same as their "Nonparametric Statistical Inference for Ergodic Processes", IEEE Transactions on Information Theory 56 (2010): 1430--1435
- Wenguang Sun, T. Tony Cai, "Large-scale multiple testing under dependence", Journal of the Royal Statistical Society B 71 (2008): 393--424
- Albert Vexler, "Martingale Type Statistics Applied to Change Point Detection", Communications in Statistics - Theory and Methods 37 (2008): 1207--1224
- Recommended, historical:
- G. A. Barnard, "Control Charts and Stochastic Processes", Journal of the Royal Statistical Society B 21 (1959): 239--271 [JSTOR]
- P. J. Harrison and C. F. Stevens, "A Bayesian Approach to Short-term Forecasting", Operational Research Quarterly 22 (1971): 341-–362 [A reasonable approach, based on Kalman smoothing, with a wild application unintended by the authors]
- E. S. Page, "Continuous Inspection Schemes", Biometrika 41 (1954): 100--115 [JSTOR. Origin of the famous "cumulative sum" ("CUSUM") procedure for finding breaks.]
- A. Wald and J. Wolfowitz, "Optimum Character of the Sequential Probability Ratio Test", Annals of Mathematical Statistics 19 (1948): 326--339
- To read:
- Alexander Aue and Lajos Horváth, "Structural breaks in time series", Journal of Time Series Analysis 34 (2013): 1--16
- Boris Brodsky and Boris Darkhovsky, "Sequential change-point detection for mixing random sequences under composite hypotheses", Statistical Inference for Stochastic Processes 11 (2008): 35--54
- S. Camargo, S. M. Duarte Quieros and C. Anteneodo, "Nonparametric segmentation of nonstationary time series", Physical Review E 84 (2011): 046702
- Cheng-Der Fuh
- "SPRT and CUSUM in hidden Markov models", Annals of Statistics 31 (2003): 942--977
- "Asymptotic operating characteristics of an optimal change point detection in hidden Markov models", Annals of Statistics 32 (2004): 2305--2339 = math.ST/0503682
- Noah D. Gade, Jordan Rodu, "Change Point Detection with Conceptors", arxiv:2308.06213
- Samir Ben Hariz, Jonathan J. Wylie, Qiang Zhang, "Optimal rate of convergence for nonparametric change-point estimators for nonstationary sequences", Annals of Statistics 35 (2007): 1802--1826, arxiv:0710.4217
- Heping He and Thomas A. Severini, "Asymptotic properties of maximum likelihood estimators in models with multiple change points", Bernoulli 16 (2010): 759--779, arxiv:1102.5224
- C. T. Jose, B. Ismail, S. Jayasekhar, "Trend, Growth Rate, and Change Point Analysis: A Data Driven Approach", Communications in Statistics: Simulation and Computation 37 (2008): 498--506
- Azadeh Khaleghi, Daniil Ryabko, "Multiple Change-Point Estimation in Stationary Ergodic Time-Series", arxiv:1203.1515
- Shiqing Ling, "Testing for change points in time series models and limiting theorems for NED sequences", Annals of Statistics 35 (2007): 1213--1237, arxiv:0708.2369
- George V. Moustakides, "Sequential change detection revisited", arxiv:0804.0741 = Annals of Statistics 36 (2008): 787--807
- Chun Yip Yau and Richard A. Davis, "Likelihood inference for discriminating between long-memory and change-point models", Journal of Time Series Analysis 33 (2012): 649--664