PARALLEL CORPUS AS AN INTEGRL PART OF THE LANGUAGE CORPUS
Abstract
The language corpus is a large electronic collection of texts used to study various aspects of language, including syntax, vocabulary, stylistics, and semantics. One of the key types of language corpus is parallel corpus. It consists of texts translated from one language to another, which allows for comparative linguistic analysis and the use of this data for machine translation tasks and the development of natural language processing (NLP) technologies. Parallel corpora occupy an important place in modern linguistics and are an integral part of the language corpus. The purpose of this work is to study the parallel corpus as an important component of the language corpus, as well as its role in various fields of linguistics and technology.
References
Barsova, I. V. (2014). The language corpus as a tool for studying translation: problems and prospects. Moscow: Nauka Publ.
Kunts, V. (2017). Parallel corpora in linguistics and machine translation. St. Petersburg: St. Petersburg State University Publishing House.
Rakhmatullina, A. R. (2018). Using parallel corpora for translation analysis in the context of machine learning. Kazan: Kazan University.
Tiedemann, J. (2012). OPUS – A Collection of Multilingual Parallel Corpora for Computational Linguistics. Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), 2371–2376.
Koehn, P. (2005). Europarl: A Parallel Corpus for Statistical Machine Translation. Machine Translation, 20(2), 1–20.
Lavergne, T., Marcu, D., & Koehn, P. (2010). Predictive Models for Statistical Machine Translation. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), 317–326.
Vasilescu, L., & Barbu, I. (2020). Parallel Corpora for Language Technologies and Computational Linguistics. Journal of Linguistic Research, 27(4), 501–513.
Zhang, M., & Zhao, W. (2016). The Role of Parallel Corpora in Machine Translation Systems. Computational Linguistics Journal, 42(3), 75–89.
Liu, Q., & Xu, W. (2019). Applications of Parallel Corpora in Cross-Linguistic Semantics and Lexicography. Lexicographic Studies, 35, 145–159.
TED Conferences. (2020). TED Talks: A Multilingual Parallel Corpus. [online] Available at: https://www.ted.com/translate.






Azerbaijan
Türkiye
Uzbekistan
Kazakhstan
Turkmenistan
Kyrgyzstan
Republic of Korea
Japan
India
United States of America
Kosovo