Abstract
We present a large-scale Native Language Identification (NLI) experiment on new data, with a focus on cross-corpus evaluation to identify corpus- and genre-independent language transfer features. We test a new corpus and show it is comparable to other NLI corpora and suitable for this task. Cross-corpus evaluation on two large corpora achieves good accuracy and evidences the existence of reliable language transfer features, but lower performance also suggests that NLI models are not completely portable across corpora. Finally, we present a brief case study of features distinguishing Japanese learners' English writing, demonstrating the presence of cross-corpus and cross-genre language transfer features that are highly applicable to SLA and ESL research.
Original language | English |
---|---|
Title of host publication | 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015 |
Subtitle of host publication | Proceedings of the Conference |
Place of Publication | Red Hook, NY |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 1403-1409 |
Number of pages | 7 |
ISBN (Electronic) | 9781941643495 |
DOIs | |
Publication status | Published - 2015 |
Event | Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015 - Denver, United States Duration: 31 May 2015 → 5 Jun 2015 |
Other
Other | Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015 |
---|---|
Country/Territory | United States |
City | Denver |
Period | 31/05/15 → 5/06/15 |