TRANSFORMER-BASED SPAM DETECTION FOR UZBEK LANGUAGE: A COMPARATIVE STUDY WITH TRADITIONAL MACHINE LEARNING MODELS

Authors

  • Atajonov Muzaffar Ne'matjon ugli Teacher, Jaloliddin Manguberdi Military-Academic Lyceum Tashkent, Uzbekistan

Keywords:

Uzbek NLP; Spam Detection; BERT; Machine Learning; Text Classification; Transformer; Low-Resource Language

Abstract

Spam detection remains a critical challenge in natural language processing, particularly for low-resource languages such as Uzbek. While traditional machine learning approaches have been widely applied to text classification tasks, their reliance on handcrafted features limits contextual understanding. Recent advances in Transformer-based architectures, especially BERT (Bidirectional Encoder Representations from Transformers), have demonstrated superior performance in capturing semantic relationships within text.

This study proposes a BERT-based spam detection model for Uzbek SMS messages and compares its effectiveness with conventional machine learning models including Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression. A labeled Uzbek SMS dataset was utilized, divided into training and testing subsets. Traditional models were trained using TF-IDF feature extraction, while a multilingual BERT model was fine-tuned for binary classification.

Experimental results indicate that the Transformer-based model significantly outperforms classical approaches in terms of accuracy, precision, recall, and F1-score. The findings confirm that contextual embeddings are highly effective for spam detection in morphologically rich and low-resource languages.

The research contributes to Uzbek NLP by providing one of the first systematic comparative analyses between deep contextual models and traditional machine learning techniques for spam classification. Practical implications include the potential integration of the proposed model into mobile communication platforms. Limitations include dataset size and computational requirements, which future studies may address using lightweight Transformer architectures.

References

Drucker, H.; Wu, D.; Vapnik, V.N. Support vector machines for spam categorization. IEEE Transactions on Neural Networks, vol. 10, no. 5, pp. 1048–1054, 1999. DOI: 10.1109/72.788645.

Androutsopoulos, I.; Koutsias, J.; Chandrinos, K.; Spyropoulos, C. An evaluation of Naïve Bayesian anti-spam filtering. Proceedings of the Workshop on Machine Learning in the New Information Age, pp. 9–17, 2000.

Joulin, A.; Grave, E.; Bojanowski, P.; Mikolov, T. Bag of Tricks for Efficient Text Classification. EACL, pp. 427–431, 2017. DOI: 10.18653/v1/E17-2068.

Vaswani, A.; Shazeer, N.; Parmar, N.; et al. Attention Is All You Need. Advances in Neural Information Processing Systems, 2017. DOI: 10.48550/arXiv.1706.03762.

Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT, pp. 4171–4186, 2019. DOI: 10.18653/v1/N19-1423.

Sun, C.; Qiu, X.; Xu, Y.; Huang, X. How to Fine-Tune BERT for Text Classification? China National Conference on Chinese Computational Linguistics, 2019. DOI: 10.48550/arXiv.1905.05583.

Conneau, A.; Lample, G.; et al. Cross-lingual Language Model Pretraining. NeurIPS, 2019. DOI: 10.48550/arXiv.1901.07291.

Liu, Y.; Ott, M.; Goyal, N.; et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv, 2019. DOI: 10.48550/arXiv.1907.11692.

Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv, 2019. DOI: 10.48550/arXiv.1910.01108.

Kim, Y. Convolutional Neural Networks for Sentence Classification. EMNLP, pp. 1746–1751, 2014. DOI: 10.3115/v1/D14-1181.

Downloads

Published

2026-02-16

How to Cite

Atajonov Muzaffar Ne'matjon ugli. (2026). TRANSFORMER-BASED SPAM DETECTION FOR UZBEK LANGUAGE: A COMPARATIVE STUDY WITH TRADITIONAL MACHINE LEARNING MODELS. Ethiopian International Journal of Multidisciplinary Research, 13(2), 921–926. Retrieved from https://eijmr.org/index.php/eijmr/article/view/5177