CORPUS SELECTION AND DATA COLLECTION METHODS FOR ENGLISH AND UZBEK LANGUAGE DICTIONARIES
Keywords:
corpus linguistics, data collection, lexicography, bilingual dictionary, English, Uzbek, lexical resources, corpus design, language data, digital lexicographyAbstract
This article explores the principles and methods of corpus selection and data collection for the development of English and Uzbek language dictionaries. The study emphasizes the importance of using well-balanced and representative corpora to ensure the accuracy, relevance, and usability of lexical entries. It also discusses various sources of linguistic data, including written and spoken texts, electronic databases, and user-generated content. By comparing practices in both languages, the paper highlights effective strategies for compiling bilingual lexicographic resources. The findings aim to contribute to the improvement of dictionary-making practices and promote the integration of modern corpus linguistics into lexicography.
References
Karimov, Z. (2021). Corpus-based approaches in Uzbek-English bilingual lexicography. Tashkent State University of Uzbek Language and Literature Press.
Tursunov, A. (2020). The role of national corpora in Uzbek lexicography. Journal of Philological Research, 15(2), 78–85.
Ergashev, R., & Abdullaeva, N. (2019). Challenges of corpus development for the Uzbek language. Uzbek Journal of Language Studies, 7(1), 45–59.
Sadikova, M. (2022). Digitization of Uzbek texts for corpus linguistics: Problems and solutions. Modern Linguistic Issues, 8(3), 90–101.
Mukhamedov, D. (2021). Designing balanced corpora for low-resource languages: The case of Uzbek. Tashkent Linguistics Journal, 10(4), 112–124.
Davies, M. (2008). The Corpus of Contemporary American English (COCA): 560 million words, 1990–present. BYU.
McEnery, T., & Hardie, A. (2012). Corpus Linguistics: Method, Theory and Practice. Cambridge University Press.
Sinclair, J. (1991). Corpus, Concordance, Collocation. Oxford University Press.
Kilgarriff, A., & Grefenstette, G. (2003). Introduction to the special issue on the web as corpus. Computational Linguistics, 29(3), 333–347.
Allamurodov, A. (2023). Developing a bilingual lexicon based on parallel corpora: English-Uzbek applications. Proceedings of the National Conference on Applied Linguistics, 2(1), 55–62.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.