Finnish Native Language Identification

Research output: Contribution to journalConference paperpeer-review


We outline the first application of Native Language Identification (NLI) to Finnish learner data. NLI is the task of predicting an author's first language using writings in an acquired language. Using data from a new learner corpus of Finnish — a language typology quite different from others previously investigated, with its morphological richness potentially causing difficulties — we show that a combination of three feature types is useful for this task. Our system achieves an accuracy of 70% against a baseline of 20% for predicting an author's L1. Using the same features we can also distinguish non-native writings with an accuracy of 97%. This methodology can be useful for studying language transfer effects, developing teaching materials tailored to students’ native language and also forensic linguistics.
Original languageEnglish
Pages (from-to)139-144
Number of pages6
JournalProceedings of Australasian Language Technology Association Workshop 2014 : ALTA 2014
Publication statusPublished - 2014
EventAustralasian Language Technology Association Workshop (12th : 2014) - Melbourne, Australia
Duration: 26 Nov 201428 Nov 2014


Dive into the research topics of 'Finnish Native Language Identification'. Together they form a unique fingerprint.

Cite this