Abstract
We present the first application of Native Language Identification (NLI) to non-English data. Motivated by theories of language transfer, NLI is the task of identifying a writer's native language (L1) based on their writings in a second language (the L2). An NLI system was applied to Chinese learner texts using topic-independent syntactic models to assess their accuracy. We find that models using part-of-speech tags, context-free grammar production rules and function words are highly effective, achieving a maximum accuracy of 71% . Interestingly, we also find
that when applied to equivalent English data, the model performance is almost identical. This finding suggests a systematic pattern of cross-linguistic transfer may exist, where the degree of transfer is independent of the L1 and L2.
Original language | English |
---|---|
Title of host publication | EACL 2014 |
Subtitle of host publication | 14th Conference of the European Chapter of the Association for Computational Linguistics : proceedings of the conference |
Place of Publication | Stroudsburg, PA, USA |
Publisher | Association for Computational Linguistics |
Pages | 95-99 |
Number of pages | 5 |
ISBN (Print) | 9781937284787 |
Publication status | Published - 2014 |
Event | Conference of the European Chapter of the Association for Computational Linguistics (14th : 2014) - Gothenburg, Sweden Duration: 26 Apr 2014 → 30 Apr 2014 |
Conference
Conference | Conference of the European Chapter of the Association for Computational Linguistics (14th : 2014) |
---|---|
City | Gothenburg, Sweden |
Period | 26/04/14 → 30/04/14 |