MFCC–CNN BASED SPEECH RECOGNITION SYSTEM FOR AN INTELLIGENT MOBILE ROBOT DESIGNED FOR UZBEK LANGUAGE PROCESSING
Keywords:
speech recognition, MFCC, CNN, HMM, DTW, mobile robot, acoustic modeling, Uzbek language, hearing-impaired children.Abstract
This paper presents the mathematical and algorithmic foundations of an intelligent mobile robot designed for automatic speech recognition (ASR) and speech correction in the Uzbek language. A large-scale acoustic dataset of children’s speech was processed using Mel-Frequency Cepstral Coefficients (MFCC), formant analysis, energy parameters, and temporal features. A hybrid recognition pipeline combining classical techniques (DTW, HMM) and a proposed MFCC–CNN deep learning architecture was developed. Experiments were conducted with 25 hearing-impaired children and 30 participants providing command words. Results demonstrate that the proposed system significantly improves speech clarity and recognition accuracy: average articulation accuracy increased from 61.8% to 86.7%, while FAR and FRR values decreased to 0.11 and 0.07, respectively. The findings confirm the applicability of MFCC–CNN models in robotic speech interfaces for the Uzbek language.References
L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition. Englewood Cliffs, NJ: Prentice Hall, 1993.
S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Trans. Acoust., Speech, Signal Process., vol. 28, no. 4, pp. 357–366, 1980.
K. J. Piczak, “Environmental sound classification with convolutional neural networks,” in Proc. IEEE MLSP, 2015.
D. Amodei et al., “Deep Speech 2: End-to-end speech recognition in English and Mandarin,” in Proc. ICML, 2016.
E. Yilmaz, J. van Doremalen, and H. Strik, “Automatic speech recognition for children: A review,” in Proc. SLaTE Workshop, 2016.
A. Hannun, “Deep Speech: Scaling up end-to-end speech recognition,” arXiv preprint arXiv:1412.5567, 2014.
N. Gurbanova and B. Tolegenov, “Speech technologies for agglutinative languages: Challenges and solutions,” Journal of Language Technologies, vol. 14, no. 2, pp. 55–67, 2022.
A. Kheddar and R. Alami, “Human–robot interaction: From speech recognition to collaborative robotics,” Ann. Rev. Control Robot. Auton. Syst., vol. 2, pp. 21–47, 2019.
·
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.