Notebooks

Change-Point Problems

Last update: 08 Dec 2024 00:26
First version: 23 December 2011, major expansion 16 March 2020

Suppose you have a time series which has some (stochastic) property you're interested in, say its expected value or its variance. You think this is usually constant, but that if it does change, it does so abruptly. You would like to know if and when it changes, and perhaps to localize the time when it did. You now have a change-point problem.

In an "offline" setting, where you've collected all you're data and want to analyze it in retrospect, there are fairly straightforward approaches, especially if you are willing to invoke parametric models. We imagine that we've got data \( X_1, X_2, \ldots X_T \equiv X_{1:T} \), and that if there's no change point then these follow some known probability distribution \( p(x_{1:T}; \theta_0) \). If on the other hand there is a break after time \( t \), then the distribution is \( p(x_{1:t};\theta_0) p(x_{t+1:T};\theta_1) \). (If you really want to condition the after-break stuff on the pre-break stuff go ahead.) The Neyman-Pearson lemma tells us that the optimal test is the likelihood ratio test, so we should locate the break-point at \( t \) if the log likelihood ratio \[ \log{\frac{p(x_{1:t};\theta_0) p(x_{t+1:T};\theta_1)}{p(x_{1:T};\theta_0)}} \] is sufficiently large. If neither the pre-change nor post-change parameters are exactly known, we can often still get good results by substituting in their maximum likelihood estimators (or other consistent, efficient estimators), say \[ \log{\frac{p(x_{1:t};\hat{\theta}_0) p(x_{t+1:T};\hat{\theta}_1)}{p(x_{1:T};\tilde{\theta}_0)}} \] If the basic parametric family has \( d \) continuous parameters, then the version with the change point at time \( t \) has \( 2d \) continuous parameters, and we'd typically get a \( \chi^2_{2d-d} = \chi^2_{d} \) distribution for the distribution of the log likelihood ratio under the null hypothesis of no change. (Notice that the estimate of \( \theta_0 \) should be different under the null of no change and the alternative of a change at \( t \), so I've given the different estimators different accent marks.)

We usually also don't just care about one particular possible change point \( t \), but rather want to consider a range as possibilities. This will often be handled by maximizing the likelihood over different possible values of \( t \). Since that's not a continuous parameter, the usual \( \chi^2 \) theory doesn't apply.


Notebooks: