Norwegian Native Language Identification

Shervin Malmasi, Mark Dras, Irina P. Temnikova

Research output: Contribution to journalConference paperpeer-review

Abstract

We present a study of Native Language Identification (NLI) using data from learners of Norwegian, a language not yet used for this task. NLI is the task of predicting a writer’s first language using only their writings in a learned language. We find that three feature types, function words, part-of-speech n-grams and a hybrid part-of-speech/function word mixture n-gram model are useful here. Our system achieves an accuracy of 79% against a baseline of 13% for predicting an author’s L1. The same features can distinguish non-native writing with 99% accuracy. We also find that part-of-speech n-gram performance on this data deviates from previous NLI results, possibly due to the use of manually post-corrected tags.
Original languageEnglish
Pages (from-to)404-412
Number of pages9
JournalRANLP 2015 : International Conference Recent Advances in Natural Language Processing : proceedings
Publication statusPublished - 2015
EventInternational Conference Recent Advances in Natural Language Processing (10th : 2015) - Hissar, Bulgaria
Duration: 7 Sept 20159 Sept 2015

Cite this