Abstract
We present a study of Native Language Identification (NLI) using data from learners of Norwegian, a language not yet used for this task. NLI is the task of predicting a writer’s first language using only their writings in a learned language. We find that three feature types, function words, part-of-speech n-grams and a hybrid part-of-speech/function word mixture n-gram model are useful here. Our system achieves an accuracy of 79% against a baseline of 13% for predicting an author’s L1. The same features can distinguish non-native writing with 99% accuracy. We also find that part-of-speech n-gram performance on this data deviates from previous NLI results, possibly due to the use of manually post-corrected tags.
Original language | English |
---|---|
Pages (from-to) | 404-412 |
Number of pages | 9 |
Journal | RANLP 2015 : International Conference Recent Advances in Natural Language Processing : proceedings |
Publication status | Published - 2015 |
Event | International Conference Recent Advances in Natural Language Processing (10th : 2015) - Hissar, Bulgaria Duration: 7 Sept 2015 → 9 Sept 2015 |