MATHEMATICAL ANALYSIS OF CONVERGENCE FOR OPTIMIZATION ALGORITHMS IN NEURAL NETWORK TRAINING.

Khamdamova Dilnoza Rahmatilla kizi

Authors

Khamdamova Dilnoza Rahmatilla kizi “Assistant of the Department of “Technological Machines and Labor Protection” Andijan State Technical Institute

Keywords:

Neural networks, optimization, convergence analysis, gradient descent, stochastic optimization, saddle points, L-smoothness, Adam algorithm.

Abstract

This paper presents a rigorous mathematical analysis of the convergence properties of key optimization algorithms used in neural network training. The study investigates the dynamics of Gradient Descent (GD), Stochastic Gradient Descent (SGD), and Adam within non-convex loss landscapes. The analysis reveals that stochastic methods possess a distinct advantage in escaping saddle points via gradient noise, while adaptive methods significantly accelerate the convergence rate through coordinate-wise normalization. The results provide a theoretical foundation for the trade-off between optimization speed and the generalization capability of deep learning models.

References

Bottou, L., Curtis, F. E., & Nocedal, J. (2018). Optimization methods for large-scale machine learning. SIAM Review, 60(2), 223–311. https://doi.org/10.1137/16M1080173

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. http://www.deeplearningbook.org

Hardt, M., Recht, B., & Singer, Y. (2016). Train faster, generalize better: Stability of stochastic gradient descent. Proceedings of the 33rd International Conference on Machine Learning (ICML), PMLR 48:1225-1234.

Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1412.6980

Luo, L., Xiong, Y., Liu, Y., & Sun, X. (2019). Adaptive gradient methods with dynamic bound of learning rate. International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=Bkgj_S0zS7

Nesterov, Y. (2018). Lectures on Convex Optimization (2nd ed.). Springer Optimization and Its Applications. https://doi.org/10.1007/978-3-319-91578-4

Polyak, B. T. (1963). Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 3(4), 1295–1313.

Reddi, S. J., Kale, S., & Kumar, S. (2018). On the convergence of Adam and beyond. International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=ryQu7f-RZ

Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22(3), 400–407. [подозрительная ссылка удалена]

Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747.