Cognate identification using machine translation

Research output: Contribution to journalConference paperpeer-review

Abstract

In this paper we describe an approach to automatic cognate identification in monolingual texts using machine translation. This system was used as our entry in the 2015 ALTA shared task, achieving an F1- score of 63% on the test set. Our proposed approach takes an input text in a source language and uses statistical machine translation to create a word-aligned parallel text in the target language. A robust measure of string distance, the JaroWinkler distance in this case, is then applied to the pairs of aligned words to detect potential cognates. Further extensions to improve the method are also discussed.
Original languageEnglish
Pages (from-to)138-141
Number of pages4
JournalALTA 2015 : Proceedings of Australasian Language Technology Association Workshop 2015
Publication statusPublished - 2015
EventAustralasian Language Technology Association Workshop (13th : 2015) - Parramatta, NSW
Duration: 8 Dec 20159 Dec 2015

Cite this