Abstract
In this paper we describe an approach to automatic cognate identification in monolingual texts using machine translation. This system was used as our entry in the 2015 ALTA shared task, achieving an F1- score of 63% on the test set. Our proposed approach takes an input text in a source language and uses statistical machine translation to create a word-aligned parallel text in the target language. A robust measure of string distance, the JaroWinkler distance in this case, is then applied to the pairs of aligned words to detect potential cognates. Further extensions to improve the method are also discussed.
Original language | English |
---|---|
Pages (from-to) | 138-141 |
Number of pages | 4 |
Journal | ALTA 2015 : Proceedings of Australasian Language Technology Association Workshop 2015 |
Publication status | Published - 2015 |
Event | Australasian Language Technology Association Workshop (13th : 2015) - Parramatta, NSW Duration: 8 Dec 2015 → 9 Dec 2015 |