Parallel corpus of texts ПарКУМ

  • N.P. Darchuk, M.O. Langenbakh, V.M. Sorokin, Ya.V. Khodakivska
Keywords: parallel corpus, corpus linguistics, annotation of the corpus, parallel texts, translation equivalents

Abstract

Despite the fact that Ukrainian corpus linguistics has some visible achievements, there is one field which is still almost unexplored – parallel corpora. In this paper we present a project of parallel corpus containing Ukrainian texts. The goal of our research is to formulate the basic principles of parallel corpus development. The tasks being solved are: to define the directions of translation and demands to the textual material; to choose necessary parameters of annotation; to build the architecture of corpus system and define user roles; to develop user interface. The article gives the information about all types of tagging specified for the corpus texts: metadata, structural and linguistic annotation. The corpus works in two modes: administrative (available after the registration at the http://mova.info) and search. The project works in test mode. The Ukrainian-English and English-Ukrainian parallel texts are being collected now and some examples of them are already available for search. On the next stages the corpus fill be filled with other parallel texts – polish, Bulgarian, Turkish, German etc.

References

1. Bobkova T.V. Kontseptsiia kolokatsii: korpusnyi pidkhid [The conception of the collocation: the corpus approach] / T.V. Bobkova // Naukovyi visnyk Mizhnarodnoho humanitarnoho universytetu. Seriia: Filolohiia. – 2014. – Vyp. 10(2). – S. 42–45.



2. Buk S. Statystychni kharakterystyky romanu Ivana Franka “Osnovy suspilnosti” [The statistical characteristics of the Ivan Franko’s novel “The Basis of Publicness”] / S. Buk // Visnyk Natsionalnoho universytetu “Lvivska politekhnika”. – Lviv, 2010. – No. 676. – S. 90–93.



3. Danyliuk I.H. Korpus tekstiv dlia vyvchennia hramatychnoi sluzhbovosti [The textual corpus for the grammatical functionality study] / I.H. Danyliuk // Linhvistychni studii. – 2013. – Vyp. 26. – S. 224–230.



4. Langenbakh M.O. Avtomatyzatsiia stylistychnykh doslidzhen ukrainskykh tekstiv [The automatization of the stylistic studies of the Ukrainian texts] / M.O. Langenbakh // Suchasna ukrainistyka: problemy movy, literatury I kultury: Olomoutskyi sympozium ukrainistiv. – Olomouts, 2016. – Vyp. 7. – S. 146–152.



5. Savchuk S.O. Metatekstovaya razmetka v natsionalnom korpuse russkogo yazyka: bazovye printsipy i osnovnye funktsii [The metadata in the annotation of The National Corpus of the Russian Language] / S.O. Savchuk // Natsionalnyy korpus russkogo yazyka: 2003–2005. Rezultaty i perspektivy. – Rezhim dostupu : http://ruscorpora.ru/sbornik2005/05savchuk.pdf



6. Siruk O.B. Leksychni perekladni ekvivalenty v bolharskykh i ukrainskykh paralelnykh tekstakh [The lexical translation equivalents in the Bulgarian and Ukrainian texts] / O.B. Siruk // Ukrainske movoznavstvo. – K., 2013. – No. 43. – S. 75–86.



7. Sevbo I.P. Graficheskoe predstavlenie sintaksicheskikh struktur i stilisticheskaya diagnostika [The Graphic representation of the syntactic structures and the stylistic diagnostics] / I.P. Sevbo. – K.: Naukova dumka, 1981. – 192 s.



8. Tyshchenko-Monastyrska O. Paralelni ukrainsko-rosiiskyi ta rosiisko-ukrainskyi korpusy [The parallel Russian-Ukrainian and Ukrainian-Russian corpora] / O. Tyshchenko-Monastyrska, M. Shvedova, D. Sichinava // Leksykohrafichnyi biuleten: [zb. nauk. pr.]. – K.: In-t ukrainskoi movy NAN Ukrainy, 2011. – Vyp. 20. – S. 35–38.



9. Shvedova M. Korpusna linhvistyka ta leksyko-hramatychna typolohiia [The corpus lingustics and the lexical and grammatical typology] / M. Shvedova, D. Sichinava // Ukrainske movoznavstvo. – K., 2013. – Vyp. 43. – S. 95–103.



10. Church K.W. Word association norms, mutual information, and lexicography / K.W. Church, N.J. Hanks // Computational Linguistics, 2010. – No. 16. – P. 22–29.



11. Frankenberg-Garcia A. Lost in parallel concordances / A. Frankenberg-Garcia // Corpora and Language Learners. – Amsterdam-Philadelfia, 1996. – P. 213–232.



12. Kotsyba N. Polsko-Ukraiński Korpus Równoległy PolUKR i jego następca PolUKR-2 / N. Kotsyba // Polskojęzyczne korpusy równoległe. – Warszawa, 2016. – P. 133–142.



13. Pragmatics and Corpus Linguistics: A Mutualistic Entente / [ed. J. Romero Trillo]. – Berlin: Mouton de Gruyter, 2008 – 282 p.



14. Utiyama M. Mining patterns from parallel corpora / M. Utiyama, H. Isahara // Learning Machine Translation. – Cambridge, London: The MIT Press, 2009. – P. 41–58.



15. Waldenfels R. ParaSol: Introduction to a Slavic Parallel Corpus / R. von Waldenfels // Prace Filologiczne. – Warszawa: Wydział Polonistyki Uniwersytetu Warszawskiego, 2012. – No. LXIII. – P. 293–302.
Published
2018-05-11
How to Cite
V.M. Sorokin, Ya.V. Khodakivska, N. D. M. L. (2018, May 11). Parallel corpus of texts ПарКУМ. Scientific Journal of National Pedagogical Dragomanov University. Series 9. Current Trends in Language Development, (15), 28-35. Retrieved from https://sjnpu.com.ua/index.php/journal/article/view/73