Stochastic Approximation Algorithms

Last update: 21 Apr 2025 21:17
First version:

Logically, "stochastic approximation" could refer to a great range of things, but in practice it has become something of a technical term for procedures that approximate the solution of an equation, observed through noise, or which try to minimize a function, again observed through noise. The former --- root-finding --- is sometimes known as the Robbins-Monro problem, and the latter --- minimization --- as the Kiefer-Wolfowitz problem. (Tangentially, that Wolfowitz is the father of the Wolfowitz who helped drag us into the invasion of Iraq.) This turns out to be a subject with deep connections to the theory of on-line learning algorithms and recursive estimation, which is really how I became interested in it, but I also like it nowadays because it provides some very cute, yet powerful, probability examples...

Michel Benaïm, "Dynamics of stochastic approximation algorithms", Séminaire de probabilités (Strasbourg) 33 (1999): 1--68 [Link to full text, bibliography, etc.]
Tze Leung Lai, "Stochastic Approximation", Annals of Statistics 31 (2003): 391--406
M. B. Nevel'son and R. Z. Hasminskii, Stochastic Approximation and Recursive Estimation [A truly excellent book; I don't suppose anyone has a copy they'd like to sell?]
Robin Pemantle, "A Survey of Random Processes with Reinforcement", math.PR/0610076
H. Robbins and S. Monro, "A Stochastic Approximation Method", Annals of Mathematical Statistics 22 (1951): 400--407

Arthur E. Albert and Leland A. Gardner, Jr., Stochastic Approximation and Nonlinear Regression
Léon Bottou and Olivier Bosquet, "The Tradeoffs of Large Scale Learning", in Sra et al. (eds.), below [PDF reprint via Dr. Bottou]
Abdelkader Mokkadem, Mariane Pelletier, Yousri Slaoui
- "The stochastic approximation method for the estimation of a multivariate probability density", arxiv:0807.2960
- "Revisiting Révész's stochastic approximation method for the estimation of a regression function", arxiv:0812.3973

Francis Bach and Eric Moulines, "Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n)", arxiv:1306.2119
Johanna Bertl, Gregory Ewing, Carolin Kosiol, Andreas Futschik, "Approximate Maximum Likelihood Estimation", arxiv:1507.04553
Vivek S. Borkar, Stochastic Approxmation: A Dynamical Systems Viewpoint
D. L. Burkholder, "On a Class of Stochastic Approximation Processes", Annals of Mathematical Statistics 27 (1956): 1044--1059
Sebastien Gadat and Laurent Younnes, "A Stochastic Algorithm for Feature Selection in Pattern Recognition", Journal of Machine Learning Research 8 (2007): 509--547
Sameer Kamal, "On the convergence, lock-in probability and sample complexity of stochastic approximation", arxiv:1007.4684
H. J. Kushner and G. G. Yin, Stochastic Approximation Algorithms and Applications
Sophie Laruelle and Gilles Pages, "Stochastic Approximation with Averaging Innovation", arxiv:10067.3578
Abdelkader Mokkadem and Mariane Pelletier
- "Convergence rate and averaging of nonlinear two-time-scale stochastic approximation algorithms", math.PR/0610329 = Annals of Applied Probability 16 (2006): 1671--1702
- "A companion for the Kiefer-Wolfowitz-Blum stochastic approximation algorithm", math.ST/0610487 [Simultaneously estimating both the location and the magnitude of the maximum]
Deanna Needell, Nathan Srebro, Rachel Ward, "Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm", arxiv:1310.5715
Mariane Pelletier, "Weak Convergence Rates for Stochastic Approximation with Applications to Multiple Targets and Simulated Annealing", Annals of Applied Probability 8 (1998): 10--44
D. Saad (ed.), Online Learning of Neural Networks