## Forecasting Non-Stationary Processes

*09 Mar 2024 13:35*

Some non-stationary processes are in fact easy to forecast: periodic ones, for example, are strictly speaking not stationary. An ergodic Markov chain started far from its invariant distribution is also non-stationary, but easy to predict (it will approach the stationary distribution). Both of these cases are conditionally stationary, which I think is all that's really needed.

What's more interesting is the problem of so to speak *really*
non-stationary processes. It's hard to imagine that there is any way to truly
predict an *arbitrary* non-stationary process. (Basically: as soon as
you think you have established a trend-line, the Adversary can always reverse
the trend, without creating any problems of consistency with earlier data.) If
you can constrain the class of allowable non-stationary processes, however,
then something might be possible. Alternately, one might lower expectations,
not to actually predicting well, but to predicting with low regret.

I actually have an Idea about using model averaging here, but need to find the time to work on it.

- See also:
- Ensemble Methods in Machine Learning
- Optimal Linear Prediction and Estimation
- Low-Regret Learning
- Optimal Linear Prediction and Estimation
- Time Series
- Universal Prediction

- Recommended, bigger picture:
- Oren Anava, Elad Hazan, Shie Mannor, Ohad Shamir, "Online Learning for Time Series Prediction", arxiv:1302.6927
- S. Caires and J. A. Ferreira, "On the Non-parametric Prediction of
Conditionally Stationary Sequences", Statistical Inference
for Stochastic Processes
**8**(2005): 151--184 - R. Dahlhaus, "Fitting Time Series Models to Nonstationary
Processes",
Annals of Statistics
**25**(1997): 1--37 - Mark Herbster and Manfred K. Warmuth, "Tracking the Best
Expert", Machine Learning
**32**(1998): 151--178 [PS version via Dr. Herbster] - Claire Monteleoni and Tommi S. Jaakkola, "Online Learning of Non-stationary Sequences", pp. 1093--1100 in NIPS 2003 (vol. 16) [Figuring out at what rate to switch between experts]
- Joaquin Quinonero-Candela, Masashi Sugiyama, Anton Schwaighofer and Neil D. Lawrence (eds.), Dataset Shift in Machine Learning

- Recommended, close-ups:
- Elad Hazan and Satyen Kale, "Extracting certainty from uncertainty: regret bounded by variation in costs", Machine Learning
**80**(2010): 165--188 - Jeremy Zico Kolter and Marcus A. Maloof
- "Dynamic Weighted Majority: An
Ensemble Method for Drifting
Concepts", Journal
of Machine Learning Research
**8**(2007): 2755--2790 - "Using Additive Expert Ensembles to Cope with Concept Drift", ICML 2005 [PDF reprint via Kolter]

- "Dynamic Weighted Majority: An
Ensemble Method for Drifting
Concepts", Journal
of Machine Learning Research
- Wouter M. Koolen and Tim van Erven, "Switching between Hidden Markov Models using Fixed Share", arxiv:1008.4532
- Claire Monteleoni, Gavin Schmidt, Shailesh Saroha and Eva Asplund, "Tracking Climate Models", Statistical Analysis and Data Mining
**4**(2011): 372--392 [While I list it as a "close-up" in this context, it's probably more important, in terms of its potential impact, than everything else on this page... PDF reprint via Prof. Monteleoni.] - Maxim Raginsky, Roummel F. Marcia, Jorge Silva and Rebecca M. Willett, "Sequential Probability Assignment via Online Convex Programming Using Exponential Families" [ISIT 2009; PDF]
- Maxim Raginsky, Rebecca M. Willett, C. Horn, Jorge Silva and Roummel F. Marcia, "Sequential anomaly detection in the presence of noise and limited feedback", IEEE Transactions on Information Theory
**58**(2012): 5544--5562, arxiv:0911.2904 - Kyupil Yeon, Moon Sup Song, Yongdai Kim, Hosik Choi, Cheolwoo
Park, "Model averaging via penalized regression for tracking concept
drift", Journal of Computational and Graphical
Statistics
**19**(2010): 457--473

- Pride compels me to recommend my students' work:
- Abigail Z. Jacobs, Adapting to non-stationarity with growing predictor ensembles [Senior thesis, Northwestern University, 2011]
- Michael Spece, Competitive Analysis for Machine Learning and Data Science [Ph.D. thesis, CMU, 2019. In this connection, see specifically ch. 3.]

- Modesty forbids me to recommend:
- CRS, Abigail Z. Jacobs, Kristina Lisa Klinkner and Aaron Clauset, "Adapting to Non-stationarity with Growing Expert Ensembles", arxiv:1103.0949

- To read:
- István Berkes, Lajos Horváth and Shiqing Ling, "Estimation in nonstationary random coefficient autoregressive models",
Journal of Time Series Analysis
**30**(2009): 395--416 ["the unit root problem does not exist in the RCA model"!] - O. Besbes, Y. Gur, A. Zeevi, "Non-stationary Stochastic Optimization", arxiv:1307.5449
- Satish T. S. Bukkapatnam and Changqing Cheng, "Forecasting the evolution of nonlinear and nonstationary systems using recurrence-based local Gaussian process models", Physical Review E
**82**(2010): 056206 - Alexey Chernov, Vladimir Vovk, "Prediction with Advice of Unknown Number of Experts", arxiv:1006.0475
- Michael P. Clements and David F. Hendry, Forecasting Non-Stationary Economic Time Series
- Rainer Dahlhaus and Wolfgang Polonik, "Empirical spectral processes for locally stationary time series", Bernoulli
**15**(2009): 1--39, arxiv:902.1448 - Eyal Gofer, Nicolò Cesa-Bianchi, Claudio Gentile, Yishay Mansour, "Regret Minimization for Branching Experts", COLT 2013 / Journal of Machine Learning Research Workshops and Conference Porceedings
**30**(2013): 618--638 - P. J. Harrison and C. F. Stevens, "A Bayesian Approach to Short-term Forecasting", Operational Research Quarterly
**22**(1971): 341-–362 - Ching-Kang Ing, Jin-Lung Lin, Shu-Hui Yu, "Toward optimal multistep
forecasts in non-stationary
autoregressions", Bernoulli
**15**(2009): 402--437, arxiv:0906.2266 ["Optimal" assuming that you know you are facing a linear AR model.] - Yan Karklin and Michael S. Lewicki, "A Hierarchical Bayesian Model
for Learning Nonlinear Statistical Regularities in Nonstationary Natural
Signals", Neural
Computation
**17**(2005): 397--423 - Zudi Lu, Dag Johan Steinskog, Dag Tjostheim and Qiwei Yao,
"Adaptively Varying-Coefficient Spatiotemporal Models", Journal of the Royal Statistical Society B
**71**(2009): 859--880 [PDF preprint] - Alexander O'Neill, Marcus Hutter, Wen Shao, Peter Sunehag, "Adaptive Context Tree Weighting", arxiv:1201.2056
- Joshua W. Robinson, Alexander J. Hartemink, "Learning Non-Stationary Dynamic Bayesian Networks", Journal of Machine Learning Research
**11**(2010): 3647--3680 - Masashi Sugiyama and Motoaki Kawanabe, Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation
- Nina Vaits, Edward Moroshko, Koby Crammer, "Second-Order Non-Stationary Online Learning for Regression", arxiv:1303.0140
- P. F. Verdes, P. M. Granitto and H. A. Ceccatto, "Overembedding
Method for Modeling Nonstationary Systems", Physical Review
Letters
**96**(2006): 118701 - Michael Vogt and Holger Dette, "Detecting gradual changes in locally stationary processes", Annals of Statistics
**43**(2015): 713--740, arxiv:1310.4678 and/or arxiv:1403.3808 - Qiying Wang and Peter C. B. Phillips, "A specification test for nonlinear nonstationary models", Annals of Statistics
**40**(2012): 727--758 - Ou Zhao, Michael Woodroofe, "Estimating a monotone trend", arxiv:0812.3188
- Shuheng Zhou, John Lafferty, Larry Wasserman, "Time Varying Undirected Graphs", arxiv:0802.2758

- To write:
- CRS + co-conspirators to be named later, "This Time Is Different"