Forecasting Non-Stationary Processes, and Estimating Their Parameters
Last update: 07 Dec 2024 23:47First version: 6 March 2011
Some non-stationary processes are in fact easy to forecast: periodic ones, for example, are strictly speaking not stationary. An ergodic Markov chain started far from its invariant distribution is also non-stationary, but easy to predict (it will approach the stationary distribution). Both of these cases are conditionally stationary, which I think is all that's really needed.
What's more interesting is the problem of so to speak really non-stationary processes. It's hard to imagine that there is any way to truly predict an arbitrary non-stationary process. (Basically: as soon as you think you have established a trend-line, the Adversary can always reverse the trend, without creating any problems of consistency with earlier data.) If you can constrain the class of allowable non-stationary processes, however, then something might be possible. Alternately, one might lower expectations, not to actually predicting well, but to predicting with low regret.
I actually have an Idea about using model averaging here, but need to find the time to work on it.
- See also:
- Ensemble Methods in Machine Learning
- Optimal Linear Prediction and Estimation
- Low-Regret Learning
- Optimal Linear Prediction and Estimation
- Time Series
- Universal Prediction
- Recommended, bigger picture:
- Oren Anava, Elad Hazan, Shie Mannor, Ohad Shamir, "Online Learning for Time Series Prediction", arxiv:1302.6927
- S. Caires and J. A. Ferreira, "On the Non-parametric Prediction of Conditionally Stationary Sequences", Statistical Inference for Stochastic Processes 8 (2005): 151--184
- R. Dahlhaus, "Fitting Time Series Models to Nonstationary Processes", Annals of Statistics 25 (1997): 1--37
- Mark Herbster and Manfred K. Warmuth, "Tracking the Best Expert", Machine Learning 32 (1998): 151--178 [PS version via Dr. Herbster]
- Claire Monteleoni and Tommi S. Jaakkola, "Online Learning of Non-stationary Sequences", pp. 1093--1100 in NIPS 2003 (vol. 16) [Figuring out at what rate to switch between experts]
- Joaquin Quinonero-Candela, Masashi Sugiyama, Anton Schwaighofer and Neil D. Lawrence (eds.), Dataset Shift in Machine Learning
- Recommended, close-ups:
- David T. Frazier and Bonsoo Koo, "Indirect inference for locally stationary models", Journal of Econometrics 223 (2021): 1--27 [Comments/queries]
- Elad Hazan and Satyen Kale, "Extracting certainty from uncertainty: regret bounded by variation in costs", Machine Learning 80 (2010): 165--188
- Jeremy Zico Kolter and Marcus A. Maloof
- "Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts", Journal of Machine Learning Research 8 (2007): 2755--2790
- "Using Additive Expert Ensembles to Cope with Concept Drift", ICML 2005 [PDF reprint via Kolter]
- Wouter M. Koolen and Tim van Erven, "Switching between Hidden Markov Models using Fixed Share", arxiv:1008.4532
- Claire Monteleoni, Gavin Schmidt, Shailesh Saroha and Eva Asplund, "Tracking Climate Models", Statistical Analysis and Data Mining 4 (2011): 372--392 [While I list it as a "close-up" in this context, it's probably more important, in terms of its potential impact, than everything else on this page... PDF reprint via Prof. Monteleoni.]
- Maxim Raginsky, Roummel F. Marcia, Jorge Silva and Rebecca M. Willett, "Sequential Probability Assignment via Online Convex Programming Using Exponential Families" [ISIT 2009; PDF]
- Maxim Raginsky, Rebecca M. Willett, C. Horn, Jorge Silva and Roummel F. Marcia, "Sequential anomaly detection in the presence of noise and limited feedback", IEEE Transactions on Information Theory 58 (2012): 5544--5562, arxiv:0911.2904
- Kyupil Yeon, Moon Sup Song, Yongdai Kim, Hosik Choi, Cheolwoo Park, "Model averaging via penalized regression for tracking concept drift", Journal of Computational and Graphical Statistics 19 (2010): 457--473
- Pride compels me to recommend my students' work:
- Abigail Z. Jacobs, Adapting to non-stationarity with growing predictor ensembles [Senior thesis, Northwestern University, 2011]
- Michael Spece, Competitive Analysis for Machine Learning and Data Science [Ph.D. thesis, CMU, 2019. In this connection, see specifically ch. 3.]
- Modesty forbids me to recommend [the above-mentioned "Idea"]:
- CRS, Abigail Z. Jacobs, Kristina Lisa Klinkner and Aaron Clauset, "Adapting to Non-stationarity with Growing Expert Ensembles", arxiv:1103.0949
- To read (with thanks to Michael Wieck-Sosa for references on locally-stationary processes):
- István Berkes, Lajos Horváth and Shiqing Ling, "Estimation in nonstationary random coefficient autoregressive models", Journal of Time Series Analysis 30 (2009): 395--416 ["the unit root problem does not exist in the RCA model"!]
- O. Besbes, Y. Gur, A. Zeevi, "Non-stationary Stochastic Optimization", arxiv:1307.5449
- Satish T. S. Bukkapatnam and Changqing Cheng, "Forecasting the evolution of nonlinear and nonstationary systems using recurrence-based local Gaussian process models", Physical Review E 82 (2010): 056206
- Alexey Chernov, Vladimir Vovk, "Prediction with Advice of Unknown Number of Experts", arxiv:1006.0475
- Michael P. Clements and David F. Hendry, Forecasting Non-Stationary Economic Time Series
- Rainer Dahlhaus and Wolfgang Polonik, "Empirical spectral processes for locally stationary time series", Bernoulli 15 (2009): 1--39, arxiv:902.1448
- Rainer Dahlhaus, Stefan Richter, Wei Biao Wu, "Towards a general theory for nonlinear locally stationary processes", Bernoulli 25 (2019): 1013--1044
- Jiti Gao, Bin Peng, Wei Biao Wu, and Yayi Yan, "Time-varying multivariate causal processes", Journal of Econometrics 240 (2024): 105671
- Eyal Gofer, Nicolò Cesa-Bianchi, Claudio Gentile, Yishay Mansour, "Regret Minimization for Branching Experts", COLT 2013 / Journal of Machine Learning Research Workshops and Conference Porceedings 30 (2013): 618--638
- P. J. Harrison and C. F. Stevens, "A Bayesian Approach to Short-term Forecasting", Operational Research Quarterly 22 (1971): 341-–362
- Ching-Kang Ing, Jin-Lung Lin, Shu-Hui Yu, "Toward optimal multistep forecasts in non-stationary autoregressions", Bernoulli 15 (2009): 402--437, arxiv:0906.2266 ["Optimal" assuming that you know you are facing a linear AR model.]
- Yan Karklin and Michael S. Lewicki, "A Hierarchical Bayesian Model for Learning Nonlinear Statistical Regularities in Nonstationary Natural Signals", Neural Computation 17 (2005): 397--423
- Dennis Kristensen, Young Jun Lee, "Local Polynomial Estimation of Time-Varying Parameters in Nonlinear Models", arxiv:1904.05209
- Zudi Lu, Dag Johan Steinskog, Dag Tjostheim and Qiwei Yao, "Adaptively Varying-Coefficient Spatiotemporal Models", Journal of the Royal Statistical Society B 71 (2009): 859--880 [PDF preprint]
- Alexander O'Neill, Marcus Hutter, Wen Shao, Peter Sunehag, "Adaptive Context Tree Weighting", arxiv:1201.2056
- Joshua W. Robinson, Alexander J. Hartemink, "Learning Non-Stationary Dynamic Bayesian Networks", Journal of Machine Learning Research 11 (2010): 3647--3680
- Masashi Sugiyama and Motoaki Kawanabe, Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation
- Nina Vaits, Edward Moroshko, Koby Crammer, "Second-Order Non-Stationary Online Learning for Regression", arxiv:1303.0140
- P. F. Verdes, P. M. Granitto and H. A. Ceccatto, "Overembedding Method for Modeling Nonstationary Systems", Physical Review Letters 96 (2006): 118701
- Michael Vogt and Holger Dette, "Detecting gradual changes in locally stationary processes", Annals of Statistics 43 (2015): 713--740, arxiv:1310.4678 and/or arxiv:1403.3808
- Qiying Wang and Peter C. B. Phillips, "A specification test for nonlinear nonstationary models", Annals of Statistics 40 (2012): 727--758
- Ou Zhao, Michael Woodroofe, "Estimating a monotone trend", arxiv:0812.3188
- Shuheng Zhou, John Lafferty, Larry Wasserman, "Time Varying Undirected Graphs", arxiv:0802.2758
- To write:
- CRS + co-conspirators to be named later, "This Time Is Different"