## Universal Prediction Algorithms

*27 Feb 2017 16:30*

*Given*: a single time series, perhaps
a very long one, from a stochastic
process which is basically unknown; perhaps merely that it is stationary
and ergodic.

*Desired*: a forecast which will converge on the best possible
forecast, as the series becomes longer and longer. Or: the best possible
forecast from within a fixed class of forecasting algorithms.

A solution is called a *universal* prediction algorithm because it
applied equally to all the processes within the class, and is not tailored to
any one of them.

This has connections to information theory (via universal compression algorithms), to the problem of finding Markovian representations and inference for Markov models, and to many other topics. Presumably they could be used for bootstrapping time series.

See also: Ergodic Theory; Learning in Games; Learning Theory; Low-Regret Learning; Machine Learning, Statistical Inference and Induction; Sequential Decisions Under Uncertainty; Time series

- Recommended:
- Alekh Agarwal and John C. Duchi, "The Generalization Ability of Online Algorithms for Dependent Data", arxiv:1110.2529 [If you have a learning algorithm which achieves low regret against some class of alternatives, and the data source is suitably strongly-mixing, your algorithm will generalize almost as well as the best rival.]
- Paul H. Algoet
- "Universal Schemes for Prediction, Gambling, and Portfolio
Selection," Annals of Probability
**20**(1992): 901--941 and an important Correction,**23**(1995): 474--478 - "Universal Schemes for Learning the Best Nonlinear
Predictor Given the Infinite Past and Side Information," IEEE
Transactions on Information Theory
**45**(1999): 1165--1185

- "Universal Schemes for Prediction, Gambling, and Portfolio
Selection," Annals of Probability
- Pierre Alquier and Olivier Wintenberger, "Model selection and
randomization for weakly dependent time series
forecasting", Bernoulli
**18**(2012): 883--913, arxiv:0902.2924 - Nicolo Cesa-Bianchi and Gabor Lugosi, Prediction, Learning, and Games [Mini-review]
- Imre Csiszar and Zsolt Talata, "On Rate of Convergence of Statistical Estimation of Stationary Ergodic Processes", IEEE Transactions on
Information Theory
**56**(2010): 3637--3641 - Shane Legg, "Is There an Elegant Universal Theory of Prediction?", cs.AI/0606070 [A nice set of diagonalization arguments against the hope of a universal prediction scheme which has the nice features of Solomonoff-style induction, but is actually computable.]
- Donald Ornstein and Benjamin Weiss, "How Sampling Reveals a
Process", Annals of
Probability
**18**(1990): 905--930 [Open access. A truly beautiful and inspiring paper. — The negative results here would seem to depend on their very strong notion of what it means to reconstruct a process. For instance, while in their sense it is not possible to always discriminate between two processes (unless they are Bernoulli), Ryabko and Ryabko (arxiv:0804.0510) give a consistent test to do just this for ergodic processes, not necessarily Bernoulli, by employing a weaker notion of inter-process distance.] - Maxim Raginsky, Roummel F. Marcia, Jorge Silva and Rebecca M. Willett, "Sequential Probability Assignment via Online Convex Programming Using Exponential Families" [ISIT 2009; PDF]
- Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari, "Online Learning: Random Averages, Combinatorial Parameters, and Learnability", arxiv:1006.1138
- Vladimir Vovk, Alex Gammerman and Glenn Shafer, Algorithmic Learning in a Random World [Mini-review]

- To read:
- Gérard Biau, Kevin Bleakley, László Györfi, György Ottucsák, "Nonparametric sequential prediction of time series", arxiv:0801.0327
- Pierre Gaillard, Paul Baudin , "A consistent deterministic regression tree for non-parametric prediction of time series", arxiv:1405.1533
- L. Gyorfi, G. Morvai, S. Yakowitz, "Limits to consistent on-line
forecasting for ergodic time series", IEEE Transactions on Information
Theory
**44**(1998): 886-892 = arxiv:0712.2430 - Marcus Hutter
- "Convergence and Error Bounds for Universal Prediction of Nonbinary Sequences," cs.LG/0106036
- "Convergence and Loss Bounds for Bayesian Sequence Prediction," cs.LG/0301014
- "General Loss Bounds for Universal Sequence Prediction," cs.AI/0101019
- "Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet", cs.LG/0311014

- D. Jones, M. Kohler and H. Walk, "Weakly Universally Consistent Forecasting of Stationary and Ergodic Time Series", IEEE Transactions on Information Theory
**58**(2012): 1191--1202 - Yuri Kalnishkan, Vladimir Vovk and Michael V. Vyugin, "How many
strings are easy to predict?", Information and
Computation
**201**(2005): 55--71 ["It is well known in the theory of Kolmogorov complexity that most strings cannot be compressed; more precisely, only exponentially few (O(2^n-m)) binary strings of length n can be compressed by m bits. This paper extends the 'incompressibility' property of Kolmogorov complexity to the 'unpredictability' property of predictive complexity. The 'unpredictability' property states that predictive complexity (defined as the loss suffered by a universal prediction algorithm working infinitely long) of most strings is close to a trivial upper bound (the loss suffered by a trivial minimax constant prediction strategy). We show that only exponentially few strings can be successfully predicted and find the base of the exponent."] - Gusztav Morvai, "Guessing the output of a stationary binary time series", arxiv:0710.3760
- Gusztav Morvai, Sanjeev R. Kulkarni, Andrew B. Nobel, "Regression estimation from an individual stable sequence", Statistics
**33**(1999): 99--118 = arxiv:0710.2496 - Gusztáv Morvai and Benjamin Weiss
- "Forecasting for Stationary Binary Times Series", Atca Appl. Math.
**79**(2003): 25--34 = arxiv:0710.5144 - "Forward Estimation for Ergodic Time Series",
ann. Inst. H. Poincare Prob. Statist.
**41**(2005): 859--870 arxiv:0711.3856 - "Inferring the conditional mean", Theory of
Stochastic Processes
**11**(2005): 112--120 = arxiv:0710.3757 - "Intermittent estimation of stationary
time series", Test
**13**(2004): 525--542 = arxiv:0711.0350 - "Limitations on intermittent forecasting", Statistics and
Probability Letters
**72**(2005): 285--290 = arxiv:0710.3773 - "On Classifying Processes", Bernoulli
**11**(2005): 523--532 = arxiv:0710.3775 - "On sequential estimation and prediction for
discrete time series", Stochastics and Dynamics
**7**(2007): 417--437 = arxiv:0803.4332 - "Prediction for discrete time series", Probability Theory and
Related Fields
**132**(2005): 1--12 = arxiv:0711.0471 - "On universal estimates for binary renewal processes",
Annals of Applied Probability
**18**(2008): 1970--1992, arxiv:0811.2076 - "Nonparametric sequential prediction for stationary processes", Annals of Probability
**39**(2011): 1137--1160

- "Forecasting for Stationary Binary Times Series", Atca Appl. Math.
- G. Morvai, S. Yakowitz, L. Gyorfi, "Nonparametric inference for ergodic, stationary time series", Annals of Statistics
**24**(1996): 370--379, arxiv:0711.0367 - Andrew B. Nobel, Gusztav Morvai, Sanjeev R. Kulkarni, "Density
estimation from an individual numerical sequence", IEEE Transactions on
Information Theory
**44**(1998): 537--541, arxiv:0710.2500 - Boris Ryabko, "Application of data compression methods to nonparametric estimation of characteristics of discrete-time stochastic processes",
Problems of Information Transmission
**43**(2003): 367--379 - Boris Ryabko and Jaakko Astola
- "Prediction of Large Alphabet Processes and Its Application to Adaptive Source Coding", cs.IT/0504079
- "Universal Codes as a Basis for Time Series Testing", cs.IT/0602084
- "Universal Codes as a Basis for Nonparametric Testing of Serial Independence for Time Series", cs.IT/0506094

- Daniil Ryabko
- "On Finding Predictors for Arbitrary Families of Processes", Journal of Machine Learning Research
**11**(2010): 581--602 = arxiv:0912.4883 - "Sequence prediction in realizable and non-realizable cases", arxiv:1005.5603

- "On Finding Predictors for Arbitrary Families of Processes", Journal of Machine Learning Research
- Daniil Ryabko and Marcus Hutter, "On Sequence Prediction for Arbitrary Measures", cs.LG/0606077
- Alessio Sancetta, "Universality of Bayesian Predictions",
Bayesian Analysis
**7**(2012): 1--36 - Nathan Srebro, Karthik Sridharan, Ambuj Tewari, "On the Universality of Online Mirror Descent", arxiv:1107.4080
- Hayato Takahashi, "Computational limits to nonparametric estimation for ergodic processes", IEEE Transactions on Information Theory
**57**(2011): 6995--6999, arxiv:1002.1559 - T. Weissman, "How to Filter an `Individual Sequence with Feedback'",
IEEE Transactions on Information Theory
**54**(2008): 3831--3841 - S. Yakowitz, L. Gyorfi, J. Kieffer, G. Morvai, "Strongly consistent
nonparametric forecasting and regression for stationary ergodic sequences",
J. Multivariate Analysis
**71**(1999): 24--41 = arxiv:0712.2592 - Jacob Ziv, "A Universal Prediction Lemma and Applications to
Universal Data Compression and Prediction", IEEE Transactions on
Information Theory
**47**(2001): 1528--1532