Optimization
05 Sep 2024 13:28
A more than usually inadequate palceholder.
One important question: why does gradient descent work so well in machine learning, especially for neural networks?
- See also:
- Calculus of Variations and Optimal Control Theory
- Computation
- Control Theory
- Decision Theory
- Economics
- Evolutionary Computation
- Learning in Games
- Learning Theory
- Low-Regret Learning
- Math I Ought to Learn
- Planned Economies
- Stochastic Approximation
- Recommended, big picture:
- Aharon Ben-Tal and Arkadi Nemirovski, Lectures on Modern Convex Optimization [PDF via Prof. Nemirovski]
- Cristopher Moore and Stephan Mertens, The Nature of Computation [Cris and Stephan were kind enough to let me read this in manuscript; it's magnificent. Review: Intellects Vast and Warm and Sympathetic]
- Recommended, close-ups:
- Alekh Agarwal, Peter L. Bartlett, Pradeep Ravikumar, Martin J. Wainwright, "Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization", arxiv:1009.0571
- Aureli Alabert, Alessandro Berti, Ricard Caballero, Marco Ferrante, "No-Free-Lunch Theorems in the continuum", arxiv:1409.2175
- Léon Bottou and Olivier Bosquet, "The Tradeoffs of Large Scale Learning", in Sra et al. (eds.), below [PDF reprint via Dr. Bottou]
- Venkat Chandrasekaran and Michael I. Jordan, "Computational and Statistical Tradeoffs via Convex Relaxation", Proceedings of the National Academy of Sciences (USA) 110 (2013): E1181--E1190, arxiv:1211.1073
- Paul Mineiro, Nikos Karampatziakis, "Loss-Proportional Subsampling for Subsequent ERM", arxiv:1306.1840
- Maxim Raginsky, Alexander Rakhlin, "Information-based complexity, feedback and dynamics in convex programming", arxiv:1010.2285
- Suvrit Sra, Sebastian Nowozin and Stephen J. Wright (eds.), Optimization for Machine Learning
- Ben Recht, Convex Optimization Live Blog (2024)
- Margaret H. Wright, "The interior-point revolution in optimization: History, recent developments, and lasting consequences", Bulletin of the Amererican Mathematical Society (New Series) 42 (2005): 39--56
- David H. Wolpert and William G. Macready, "No free lunch theorems for optimization", IEEE Transactions on Evolutionary Computation 1 (1997): 67--82, [SFI Working Paper]
- Recommended, historical interest:
- Richard Bellman (ed.), Mathematical Optimization Techniques [University of California Press, 1963]
- To read, historical interest:
- George B. Dantzig, Linear Programming and Extensions
- Giorgio Giorgi and Tinne Hoff Kjeldsen (eds.), Traces and Emergence of Nonlinear Programming
- I. Gumowski and C. Mira, Optimization in Control Theory and Practice
- William Thomas, Rational Action: The Sciences of Policy in Britain and America, 1940--1960
- To read, textbooks/big picture:
- Dimitri P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods [PDF reprint via Prof. Bertsekas]
- Stephen Boyd and Lieven Vandenberghe, Convex Optimization [Yes, I should have finished this decades ago; no, I haven't]
- Jan Brinkhuis and Vladimir Tikhomirov, Optimization: Insights and Applications
- Giuseppe C. Calafiore and Laurent El Ghaoui, Optimization Models
- Paulo Cortez, Modern Optimization with R
- B. Guenin, J. Könemann and L. Tunçel, A Gentle Introduction to Optimization
- Andrzej Ruszczynski, Nonlinear Optimization
- James C. Spall, Introduction to Stochastic Search and Optimization
- Rangarajan K. Sundaram, First Course in Optimization Theory
- To read, close-ups:
- Dennis Amelunxen, Martin Lotz, Michael B. McCoy, Joel A. Tropp, "Living on the edge: Phase transitions in convex programs with random data", arxiv:1303.6672
- Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W. Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, Nando de Freitas, "Learning to learn by gradient descent by gradient descent", arxiv:1606.04474
- G. Ausiello, Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties
- Francis Bach, Rodolphe Jenatton, Julien Mairal, Guillaume Obozinski, "Optimization with Sparsity-Inducing Penalties", Foundations and Trends in Machine Learning 4 (2011): 1--106, arxiv:1108.0775
- Aharon Ben-Tal, Laurent El Ghaoui and Arkadi Nemirovski, Robust Optimization
- Yatao Bian, Xiong Li, Yuncai Liu, Ming-Hsuan Yang, "Parallel Coordinate Descent Newton Method for Efficient $\ell_1$-Regularized Minimization", arxiv:1306.4080
- Sébastien Bubeck
- Introduction to Online Optimization
- "Convex Optimization: Algorithms and Complexity", Foundations and Trends in Machine Learning 8 (2015): 231--357, arxiv:1405.4980
- Rocco Caprio, Juan Kuntz, Samuel Power, Adam M. Johansen, "Error bounds for particle gradient descent, and extensions of the log-Sobolev and Talagrand inequalities", arxiv:2403.02004
- C. Cartis, N. I. M. Gould and Ph. L. Toint, "On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization" [PDF preprint via Dr. Cartis]
- Miguel Á. Carreira-Perpiñán and Weiran Wang, "Distributed optimization of deeply nested systems", arxiv:1212.5921 [Remarkable if true]
- Sourav Chatterjee, "Convergence of gradient descent for deep neural networks", arxiv:2203.16462
- Moritz Hardt, Tengyu Ma, Benjamin Recht, "Gradient Descent Learns Linear Dynamical Systems", Journal of Machine Learning Research 19 (2018): 29
- Elad Hazan, Satyen Kale, "Projection-free Online Learning", arxiv:1206.4657
- Philipp Hennig and Martin Kiefel, "Quasi-Newton Methods: A New Direction", arxiv:1206.4602
- Prateek Jain, Purushottam Kar, "Non-convex Optimization for Machine Learning", arxiv:1712.07897 (195 pp. review)
- Adel Javanmard, Andrea Montanari, and Federico Ricci-Tersenghi, "Phase transitions in semidefinite relaxations", Proceedings of the National Academy of Sciences 113 (2016): E2218--E2223
- Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade and Michael I. Jordan, "On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points", Journal of the ACMV 68 (2021): 11
- Michael I. Jordan, Guy Kornowski, Tianyi Lin, Ohad Shamir, Manolis Zampetakis, "Deterministic Nonsmooth Nonconvex Optimization", arxiv:2302.08300
- Soeren Laue, "A Hybrid Algorithm for Convex Semidefinite Optimization", arxiv:1206.4608
- Adrian S. Lewis, Tonghua Tian, "The complexity of first-order optimization methods from a metric perspective", arxiv:2305.03208
- Mehrdad Mahdavi, Rong Jin, "MixedGrad: An O(1/T) Convergence Rate Algorithm for Stochastic Smooth Optimization", arxiv:1307.7192
- Mehrdad Mahdavi, Rong Jin, Tianbao Yang, "Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints", arxiv:1111.6082
- Luke Metz, C. Daniel Freeman, Niru Maheswaranathan, Jascha Sohl-Dickstein, "Training Learned Optimizers with Randomly Initialized Learned Optimizers", arxiv:2101.07367
- N. Parikh and S. Boyd, "Proximal Algorithms", Foundations and Trends in Optimization 1 (2014): 123--231 [PDF reprint and other resources via Prof. Boyd]
- Courtney Paquette, Elliot Paquette, Ben Adlam, Jeffrey Pennington, "Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions", arxiv:2206.07252
- Luc Pronzato, Henry P. Wynn and Anatoly A. Zhigljavsky, Dynamical Search: Applications of Dynamical Systems in Search and Optimization
- Debasish Roy and G. Visweswara Rao, Stochastic Dynamics, Filtering and Optimization
- Levent Sagun, V. Ugur Guney, Gerard Ben Arous, Yann LeCun, "Explorations on high dimensional landscapes", arxiv:1412.6615
- Ohad Shamir, Tong Zhang, "Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes", arxiv:1212.1824
- Rachael Tappenden, Peter Richtárik, Jacek Gondzio, "Inexact Coordinate Descent: Complexity and Preconditioning", arxiv:1304.5530
- Matus Telgarsky, "Stochastic linear optimization never overfits with quadratically-bounded losses on general data", arxiv:2202.06915
- Belinda Tzen, Anant Raj, Maxim Raginsky, Francis Bach, "Variational Principles for Mirror Descent and Mirror Langevin Dynamics", arxiv:2303.09532
- Matthew D. Zeiler, "ADADELTA: An Adaptive Learning Rate Method", arxiv:1212.5701
- Lijun Zhang, Tianbao Yang, Rong Jin, Xiaofei He, "O(logT) Projections for Stochastic Optimization of Smooth and Strongly Convex Functions", arxiv:1304.0740
- Hui Zhang, Wotao Yin, "Gradient methods for convex minimization: better rates under weaker conditions", arxiv:1303.4645