Stochastic Approximation Algorithms
06 Apr 2023 18:36
Logically, "stochastic approximation" could refer to a great range of things, but in practice it has become something of a technical term for procedures that approximate the solution of an equation, observed through noise, or which try to minimize a function, again observed through noise. The former --- root-finding --- is sometimes known as the Robbins-Monro problem, and the latter --- minimization --- as the Kiefer-Wolfowitz problem. (Tangentially, that Wolfowitz is the father of the Wolfowitz who helped drag us into the invasion of Iraq.) This turns out to be a subject with deep connections to the theory of on-line learning algorithms and recursive estimation, which is really how I became interested in it, but I also like it nowadays because it provides some very cute, yet powerful, probability examples...
- Recommended, big picture:
- Michel Benaïm, "Dynamics of stochastic approximation algorithms", Séminaire de probabilités (Strasbourg) 33 (1999): 1--68 [Link to full text, bibliography, etc.]
- Tze Leung Lai, "Stochastic Approximation", Annals of Statistics 31 (2003): 391--406
- M. B. Nevel'son and R. Z. Hasminskii, Stochastic Approximation and Recursive Estimation [A truly excellent book; I don't suppose anyone has a copy they'd like to sell?]
- Robin Pemantle, "A Survey of Random Processes with Reinforcement", math.PR/0610076
- H. Robbins and S. Monro, "A Stochastic Approximation Method", Annals of Mathematical Statistics 22 (1951): 400--407
- Recommended, close-ups:
- Arthur E. Albert and Leland A. Gardner, Jr., Stochastic Approximation and Nonlinear Regression
- Léon Bottou and Olivier Bosquet, "The Tradeoffs of Large Scale Learning", in Sra et al. (eds.), below [PDF reprint via Dr. Bottou]
- Abdelkader Mokkadem, Mariane Pelletier, Yousri Slaoui
- "The stochastic approximation method for the estimation of a multivariate probability density", arxiv:0807.2960
- "Revisiting Révész's stochastic approximation method for the estimation of a regression function", arxiv:0812.3973
- To read:
- Francis Bach and Eric Moulines, "Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n)", arxiv:1306.2119
- Johanna Bertl, Gregory Ewing, Carolin Kosiol, Andreas Futschik, "Approximate Maximum Likelihood Estimation", arxiv:1507.04553
- Vivek S. Borkar, Stochastic Approxmation: A Dynamical Systems Viewpoint
- D. L. Burkholder, "On a Class of Stochastic Approximation Processes", Annals of Mathematical Statistics 27 (1956): 1044--1059
- Sebastien Gadat and Laurent Younnes, "A Stochastic Algorithm for Feature Selection in Pattern Recognition", Journal of Machine Learning Research 8 (2007): 509--547
- Sameer Kamal, "On the convergence, lock-in probability and sample complexity of stochastic approximation", arxiv:1007.4684
- H. J. Kushner and G. G. Yin, Stochastic Approximation Algorithms and Applications
- Sophie Laruelle and Gilles Pages, "Stochastic Approximation with Averaging Innovation", arxiv:10067.3578
- Abdelkader Mokkadem and Mariane Pelletier
- "Convergence rate and averaging of nonlinear two-time-scale stochastic approximation algorithms", math.PR/0610329 = Annals of Applied Probability 16 (2006): 1671--1702
- "A companion for the Kiefer-Wolfowitz-Blum stochastic approximation algorithm", math.ST/0610487 [Simultaneously estimating both the location and the magnitude of the maximum]
- Deanna Needell, Nathan Srebro, Rachel Ward, "Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm", arxiv:1310.5715
- Mariane Pelletier, "Weak Convergence Rates for Stochastic Approximation with Applications to Multiple Targets and Simulated Annealing", Annals of Applied Probability 8 (1998): 10--44
- D. Saad (ed.), Online Learning of Neural Networks