MATHEMATICAL ANALYSIS OF CONVERGENCE FOR OPTIMIZATION ALGORITHMS IN NEURAL NETWORK TRAINING.
Keywords:
Neural networks, optimization, convergence analysis, gradient descent, stochastic optimization, saddle points, L-smoothness, Adam algorithm.Abstract
This paper presents a rigorous mathematical analysis of the convergence properties of key optimization algorithms used in neural network training. The study investigates the dynamics of Gradient Descent (GD), Stochastic Gradient Descent (SGD), and Adam within non-convex loss landscapes. The analysis reveals that stochastic methods possess a distinct advantage in escaping saddle points via gradient noise, while adaptive methods significantly accelerate the convergence rate through coordinate-wise normalization. The results provide a theoretical foundation for the trade-off between optimization speed and the generalization capability of deep learning models.
References
Bottou, L., Curtis, F. E., & Nocedal, J. (2018). Optimization methods for large-scale machine learning. SIAM Review, 60(2), 223–311. https://doi.org/10.1137/16M1080173
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. http://www.deeplearningbook.org
Hardt, M., Recht, B., & Singer, Y. (2016). Train faster, generalize better: Stability of stochastic gradient descent. Proceedings of the 33rd International Conference on Machine Learning (ICML), PMLR 48:1225-1234.
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1412.6980
Luo, L., Xiong, Y., Liu, Y., & Sun, X. (2019). Adaptive gradient methods with dynamic bound of learning rate. International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=Bkgj_S0zS7
Nesterov, Y. (2018). Lectures on Convex Optimization (2nd ed.). Springer Optimization and Its Applications. https://doi.org/10.1007/978-3-319-91578-4
Polyak, B. T. (1963). Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 3(4), 1295–1313.
Reddi, S. J., Kale, S., & Kumar, S. (2018). On the convergence of Adam and beyond. International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=ryQu7f-RZ
Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22(3), 400–407. [подозрительная ссылка удалена]
Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747.






Azerbaijan
Türkiye
Uzbekistan
Kazakhstan
Turkmenistan
Kyrgyzstan
Republic of Korea
Japan
India
United States of America
Kosovo