Capabilities of Semantic Tagging Within the Ukrainian Corpus

  • N.P. Darchuk
Keywords: linguistic corpus, semantic tagging, taxonomic classification, taxon, thesaurus, information retrieval system

Abstract

The article views linguistic aspects of semantic tagging within the Ukrainian Corpus. The lexical content of texts of different genres, in particular, modern fiction, drama, journalism, scientific, popular scientific, and business will be provided with a specific tagging respectively. The work represents two types of tagging: I – a taxonomic one, featuring journalistic and fiction genre and II – a thesaurus-based tagging specifically for scientific and business genres.
The tagging is based on taxonomic classification applied in the Russian Corpus but extended and extra modified. There were developed the software tools for online work based on materials of frequency dictionary of journalistic style with a total volume of 40,000 lexems compiled from the sampling of 16 Million word forms of Ukrainian texts. The thesaurus-based approach is grounded on the identification of thematically relevant lexical-semantic variations and grouping them by applying a formalized method of a thesaurus construction, which meets the standards of modern terminography. There were developed the software tools for performing of two types of semantic tagging.

References

1. Apresyan Yu.D. Leksicheskaya semantika: sinonimicheskie sredstva yazyka [Lexical Semantics: Synonymous foundations of language] / Yu.D. Apresyan. – M.: Nauka, 1974. – 367 s.



2. Baevskiy V.S. Lingvisticheskie, matematicheskie, semioticheskie i komp'yuternye modeli v istorii i teorii literatury [Linguistic, Mathematical, Semiotic and Computer Models in the History of Literature] / V.S. Baevskiy. – M.: Yazyki slavyanskoy kultury, 2001. – 336 s.



3. Gerd A.S. Prikladnaya lingvistika [Applied Linguistics] / A.S. Gerd. – SPb.: Izd-vo SPb. un-ta, 2005. – 266, [1] s.



4. Darchuk N.P. Kompiuterne anotuvannia ukrainskoho tekstu: rezultaty i perspektyvy [Computer Annotating of Ukrainian Text: Results and Prospects] / Nataliia Petrivna Darchuk. – K.: Osvita Ukrainy, 2013. – 544 s.



5. Krasilshchik I.S. Predmetnye imena v sisteme “Leksikograf” [Subject Names in the “Leksikograf” System] / I.S. Krasilshchik, Ye.V. Rakhilina // Nauchno-tekhnicheskaya informatsiya, 1992. – No. 9. – S. 24–31.



6. Kustova G.I. Semanticheskaya razmetka leksiki v natsionalnom korpuse russkogo yazyka: printsipy, problemy, perspektivy [Semantic Annotation of Vocabulary in the Russian National Corpus: Principles, Problems, Prospects] / G.I. Kustova, O.N. Lyashevskaya, Ye.V. Paducheva, Ye.V. Rakhilina // Natsionalnyy korpus russkogo yazyka: 2003–2005. – M.: Indrik, 2005. – S. 155–174.



7. Kustova G.I. Slovar kak leksicheskaya baza dannykh [Dictionary as a Lexical Database] / G.I. Kustova, Ye.V. Paducheva // Voprosy yazykoznaniya. – 1994. – No. 4. – S. 96–113.



8. Nikitina S.Ye. Tezaurus po teoreticheskoy i prikladnoy lingvistike (Avtomaticheskaya obrabotka teksta) [Thesaurus on Theoretical and Applied Linguistics (Automatic Text Processing)] / S.Ye. Nikitina. – M.: Nauka, 1978. – 374 s.



9. Rakhilina Ye.V. Zadachi i printsipy semanticheskoy razmetki leksiki v NKRYa [Tasks and principles of semantic marking of vocabulary in Russian National Corpus] / Ye.V. Rakhilina, G.I. Kustova, O.N. Lyashevskaya, T.I. Reznikova, O.Yu. Shemanaeva // Natsionalnyy korpus russkogo yazyka. Novye rezultaty i perspektivy. – SPb.: “NYESTOR-ISTORIYA”, 2009. – S. 215–239.



10. Sokolovskaya Zh.P. Problemy sistemnogo opisaniya leksicheskoy semantiki [Problems of the System Description of Lexical Semantics] / Zh.P. Sokolovskaya. – K.: Naukova dumka, 1990. – 184 s.



11. Shvedova N.Yu. O sintaksicheskikh potentsiyakh formy slova [On the Syntactic Potencies of the Form of a Word] / N.Yu. Shvedova // Voprosy yazykoznaniya. – 1971. – No. 4. – S. 25–33.



12. Shvedova N.Yu. Russkiy yazyk. Izbrannye raboty [Russian Language. Selected works] / N.Yu. Shvedova. – M.: Yazyki slavyanskoy kultury, 2005. – 640 s.



13. Shtern I.B. Vybrani topiky ta leksykon suchasnoi linhvistyky: entsyklopedychnyi slovnyk [Selected Topics and Vocabulary of Modern Linguistics: Encyclopedic Dictionary] / I.B. Shtern. – K.: AtrEk, 1998. – 335 s.
Published
2018-05-11
How to Cite
Darchuk, N. (2018, May 11). Capabilities of Semantic Tagging Within the Ukrainian Corpus. Scientific Journal of National Pedagogical Dragomanov University. Series 9. Current Trends in Language Development, (15), 18-28. Retrieved from https://sjnpu.com.ua/index.php/journal/article/view/72