Abstract
We outline the first application of Native Language Identification (NLI) to Finnish learner data. NLI is the task of predicting an author's first language using writings in an acquired language. Using data from a new learner corpus of Finnish — a language typology quite different from others previously investigated, with its morphological richness potentially causing difficulties — we show that a combination of three feature types is useful for this task. Our system achieves an accuracy of 70% against a baseline of 20% for predicting an author's L1. Using the same features we can also distinguish non-native writings with an accuracy of 97%. This methodology can be useful for studying language transfer effects, developing teaching materials tailored to students’ native language and also forensic linguistics.
Original language | English |
---|---|
Pages (from-to) | 139-144 |
Number of pages | 6 |
Journal | Proceedings of Australasian Language Technology Association Workshop 2014 : ALTA 2014 |
Publication status | Published - 2014 |
Event | Australasian Language Technology Association Workshop (12th : 2014) - Melbourne, Australia Duration: 26 Nov 2014 → 28 Nov 2014 |