Universal Prediction Algorithms
17 Sep 2024 11:29
Given: a single time series, perhaps a very long one, from a stochastic process which is basically unknown; perhaps merely that it is stationary and ergodic.
Desired: a forecast which will converge on the best possible forecast, as the series becomes longer and longer. Or: the best possible forecast from within a fixed class of forecasting algorithms.
A solution is called a universal prediction algorithm because it applied equally to all the processes within the class, and is not tailored to any one of them.
This has connections to information theory (via universal compression algorithms), to the problem of finding Markovian representations and inference for Markov models, and to many other topics. Presumably they could be used for bootstrapping time series.
Addendum, September 2024: This is a topic which I began studying in graduate school, then mostly left fallow except for noticing the occasional reference. But probabilistic sequence prediction is having an unexpected day of fame, and I want to think through what this literature might have to contribute...
- See also:
- Conformal Prediction
- Ergodic Theory
- Learning in Games
- Learning Theory
- Low-Regret Learning
- Machine Learning, Statistical Inference and Induction
- Sequential Decisions Under Uncertainty
- Time series
- Recommended, big picture:
- Paul H. Algoet
- "Universal Schemes for Prediction, Gambling, and Portfolio Selection," Annals of Probability 20 (1992): 901--941 and an important Correction, 23 (1995): 474--478
- "Universal Schemes for Learning the Best Nonlinear Predictor Given the Infinite Past and Side Information," IEEE Transactions on Information Theory 45 (1999): 1165--1185
- Nicolo Cesa-Bianchi and Gabor Lugosi, Prediction, Learning, and Games
- Donald Ornstein and Benjamin Weiss, "How Sampling Reveals a Process", Annals of Probability 18 (1990): 905--930 [Open access. A truly beautiful and inspiring paper. --- The negative results here would seem to depend on their very strong notion of what it means to reconstruct a process. For instance, while in their sense it is not possible to always discriminate between two processes (unless they are Bernoulli), Ryabko and Ryabko (arxiv:0804.0510) give a consistent test to do just this for ergodic processes, not necessarily Bernoulli, by employing a weaker notion of inter-process distance.]
- Vladimir Vovk, Alex Gammerman and Glenn Shafer, Algorithmic Learning in a Random World [Mini-review]
- Recommended, close-ups:
- Alekh Agarwal and John C. Duchi, "The Generalization Ability of Online Algorithms for Dependent Data", arxiv:1110.2529 [If you have a learning algorithm which achieves low regret against some class of alternatives, and the data source is suitably strongly-mixing, your algorithm will generalize almost as well as the best rival.]
- Pierre Alquier and Olivier Wintenberger, "Model selection and randomization for weakly dependent time series forecasting", Bernoulli 18 (2012): 883--913, arxiv:0902.2924
- Imre Csiszar and Zsolt Talata, "On Rate of Convergence of Statistical Estimation of Stationary Ergodic Processes", IEEE Transactions on Information Theory 56 (2010): 3637--3641
- Shane Legg, "Is There an Elegant Universal Theory of Prediction?", cs.AI/0606070 [A nice set of diagonalization arguments against the hope of a universal prediction scheme which has the nice features of Solomonoff-style induction, but is actually computable.]
- Donald S. Ornstein, "Guessing the next output of a stationary process", Israel Journal of Mathematics 30 (1978): 292--296
- Maxim Raginsky, Roummel F. Marcia, Jorge Silva and Rebecca M. Willett, "Sequential Probability Assignment via Online Convex Programming Using Exponential Families" [ISIT 2009; PDF]
- Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari, "Online Learning: Random Averages, Combinatorial Parameters, and Learnability", arxiv:1006.1138
- To read:
- Gérard Biau, Kevin Bleakley, László Györfi, György Ottucsák, "Nonparametric sequential prediction of time series", arxiv:0801.0327
- Pierre Gaillard, Paul Baudin , "A consistent deterministic regression tree for non-parametric prediction of time series", arxiv:1405.1533
- L. Gyorfi, G. Morvai, S. Yakowitz, "Limits to consistent on-line forecasting for ergodic time series", IEEE Transactions on Information Theory 44 (1998): 886-892 = arxiv:0712.2430
- Marcus Hutter
- "Convergence and Error Bounds for Universal Prediction of Nonbinary Sequences," cs.LG/0106036
- "Convergence and Loss Bounds for Bayesian Sequence Prediction," cs.LG/0301014
- "General Loss Bounds for Universal Sequence Prediction," cs.AI/0101019
- "Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet", cs.LG/0311014
- D. Jones, M. Kohler and H. Walk, "Weakly Universally Consistent Forecasting of Stationary and Ergodic Time Series", IEEE Transactions on Information Theory 58 (2012): 1191--1202
- Yuri Kalnishkan, Vladimir Vovk and Michael V. Vyugin, "How many strings are easy to predict?", Information and Computation 201 (2005): 55--71 ["It is well known in the theory of Kolmogorov complexity that most strings cannot be compressed; more precisely, only exponentially few (O(2^n-m)) binary strings of length n can be compressed by m bits. This paper extends the 'incompressibility' property of Kolmogorov complexity to the 'unpredictability' property of predictive complexity. The 'unpredictability' property states that predictive complexity (defined as the loss suffered by a universal prediction algorithm working infinitely long) of most strings is close to a trivial upper bound (the loss suffered by a trivial minimax constant prediction strategy). We show that only exponentially few strings can be successfully predicted and find the base of the exponent."]
- E. Meron and M. Feder, "Finite-Memory Universal Prediction of Individual Sequences", IEEE Transactions on Information Theory 50 (2004): 1506--1523
- Gusztav Morvai, "Guessing the output of a stationary binary time series", arxiv:0710.3760
- Gusztav Morvai, Sanjeev R. Kulkarni, Andrew B. Nobel, "Regression estimation from an individual stable sequence", Statistics 33 (1999): 99--118 = arxiv:0710.2496
- Gusztáv Morvai and Benjamin Weiss
- "Forecasting for Stationary Binary Times Series", Atca Appl. Math. 79 (2003): 25--34 = arxiv:0710.5144
- "Forward Estimation for Ergodic Time Series", ann. Inst. H. Poincare Prob. Statist. 41 (2005): 859--870 arxiv:0711.3856
- "Inferring the conditional mean", Theory of Stochastic Processes 11 (2005): 112--120 = arxiv:0710.3757
- "Intermittent estimation of stationary time series", Test 13 (2004): 525--542 = arxiv:0711.0350
- "Limitations on intermittent forecasting", Statistics and Probability Letters 72 (2005): 285--290 = arxiv:0710.3773
- "On Classifying Processes", Bernoulli 11 (2005): 523--532 = arxiv:0710.3775
- "On sequential estimation and prediction for discrete time series", Stochastics and Dynamics 7 (2007): 417--437 = arxiv:0803.4332
- "Prediction for discrete time series", Probability Theory and Related Fields 132 (2005): 1--12 = arxiv:0711.0471
- "On universal estimates for binary renewal processes", Annals of Applied Probability 18 (2008): 1970--1992, arxiv:0811.2076
- "Nonparametric sequential prediction for stationary processes", Annals of Probability 39 (2011): 1137--1160
- G. Morvai, S. Yakowitz, L. Gyorfi, "Nonparametric inference for ergodic, stationary time series", Annals of Statistics 24 (1996): 370--379, arxiv:0711.0367
- Andrew B. Nobel, Gusztav Morvai, Sanjeev R. Kulkarni, "Density estimation from an individual numerical sequence", IEEE Transactions on Information Theory 44 (1998): 537--541, arxiv:0710.2500
- Boris Ryabko, "Application of data compression methods to nonparametric estimation of characteristics of discrete-time stochastic processes", Problems of Information Transmission 43 (2003): 367--379
- Boris Ryabko and Jaakko Astola
- "Prediction of Large Alphabet Processes and Its Application to Adaptive Source Coding", cs.IT/0504079
- "Universal Codes as a Basis for Time Series Testing", cs.IT/0602084
- "Universal Codes as a Basis for Nonparametric Testing of Serial Independence for Time Series", cs.IT/0506094
- Daniil Ryabko
- "On Finding Predictors for Arbitrary Families of Processes", Journal of Machine Learning Research 11 (2010): 581--602 = arxiv:0912.4883
- "Sequence prediction in realizable and non-realizable cases", arxiv:1005.5603
- Daniil Ryabko and Marcus Hutter, "On Sequence Prediction for Arbitrary Measures", cs.LG/0606077
- Alessio Sancetta, "Universality of Bayesian Predictions", Bayesian Analysis 7 (2012): 1--36
- Nathan Srebro, Karthik Sridharan, Ambuj Tewari, "On the Universality of Online Mirror Descent", arxiv:1107.4080
- Hayato Takahashi, "Computational limits to nonparametric estimation for ergodic processes", IEEE Transactions on Information Theory 57 (2011): 6995--6999, arxiv:1002.1559
- T. Weissman, "How to Filter an `Individual Sequence with Feedback'", IEEE Transactions on Information Theory 54 (2008): 3831--3841
- S. Yakowitz, L. Gyorfi, J. Kieffer, G. Morvai, "Strongly consistent nonparametric forecasting and regression for stationary ergodic sequences", J. Multivariate Analysis 71 (1999): 24--41 = arxiv:0712.2592
- Jacob Ziv, "A Universal Prediction Lemma and Applications to Universal Data Compression and Prediction", IEEE Transactions on Information Theory 47 (2001): 1528--1532