STAGES OF FORMING AND DIGITIZING THE AUTHOR CORPUS

Shohista Akramova Islom qizi

Authors

Shohista Akramova Islom qizi Lecturer,Asia International University, Uzbekistan

Keywords:

corpus linguistics, author corpus, digitization, metadata, tokenization, normalization, concordance, linguostatistics, idiolect, stylometry, Uzbek language, digital humanities, lexical analysis, indexing, Nusratulla Jumaxo‘ja

Abstract

This study examines the stages of forming and digitizing the Nusratulla Jumaxo‘ja author corpus within the framework of modern corpus linguistics and digital humanities. The research focuses on the scientific and methodological principles of corpus creation, including source collection, metadata development, text normalization, tokenization, indexing, concordance generation, and statistical analysis. Particular attention is paid to the challenges of digitizing Uzbek-language texts, especially issues related to OCR accuracy, Unicode standardization, and apostrophe encoding in Uzbek Latin script. The study demonstrates that the author corpus is not merely an electronic archive, but a multilayered linguistic platform designed for linguostatistical, stylistic, and semantic analysis. Through concordance and frequency-based analysis, the corpus enables the identification of the author’s idiolect, dominant lexical units, and discursive strategies. The integration of metadata and etymological modules further enhances the analytical capabilities of the system. The research concludes that the Nusratulla Jumaxo‘ja author corpus serves as an important digital resource for Uzbek linguistics, stylometry, lexicography, and corpus-based literary studies, while also offering a methodological model for the development of future author corpora in Uzbek corpus linguistics.

References

Biber, D., Conrad, S., & Reppen, R. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press, 1998.

Bowker, L., & Pearson, J. Working with Specialized Language: A Practical Guide to Using Corpora. London: Routledge, 2002.

McEnery, T., & Hardie, A. Corpus Linguistics: Method, Theory and

Akramova, Sh. I. “Nusratilla Jumaxo‘ja mualliflik korpusi konkordansi va chastotali lug‘ati (statistik tahlil asosida).” The Lingua Spectrum, Vol. 4, 2025, pp. 354–360. (The Lingua Spectrum)

Akramova, Sh. I. “Nusratullo Jumaxo‘ja mualliflik korpusi lingvistik ta’minoti.” Kompyuter lingvistikasi: muammolar, yechim, istiqbollar V xalqaro ilmiy-amaliy konferensiya materiallari, Toshkent, 2025, pp. 229–234. (compling.navoiy-uni.uz)

Akramova, Sh. I. “Mualliflik korpuslarida konkordans va chastotali tahlil metodlari.” O‘zbekiston: Language and Culture, 2025. (linguistics.tsuull.uz)

STAGES OF FORMING AND DIGITIZING THE AUTHOR CORPUS

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

Browse

INDEXING

Developed By

Information

COUNTRIES

Countries