Large-scale Native Language Identification with cross-corpus evaluation

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution

10 Citations (Scopus)

Abstract

We present a large-scale Native Language Identification (NLI) experiment on new data, with a focus on cross-corpus evaluation to identify corpus- and genre-independent language transfer features. We test a new corpus and show it is comparable to other NLI corpora and suitable for this task. Cross-corpus evaluation on two large corpora achieves good accuracy and evidences the existence of reliable language transfer features, but lower performance also suggests that NLI models are not completely portable across corpora. Finally, we present a brief case study of features distinguishing Japanese learners' English writing, demonstrating the presence of cross-corpus and cross-genre language transfer features that are highly applicable to SLA and ESL research.

Original languageEnglish
Title of host publication2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015
Subtitle of host publicationProceedings of the Conference
Place of PublicationRed Hook, NY
PublisherAssociation for Computational Linguistics (ACL)
Pages1403-1409
Number of pages7
ISBN (Electronic)9781941643495
Publication statusPublished - 2015
EventConference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015 - Denver, United States
Duration: 31 May 20155 Jun 2015

Other

OtherConference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015
CountryUnited States
CityDenver
Period31/05/155/06/15

Fingerprint Dive into the research topics of 'Large-scale Native Language Identification with cross-corpus evaluation'. Together they form a unique fingerprint.

  • Cite this

    Malmasi, S., & Dras, M. (2015). Large-scale Native Language Identification with cross-corpus evaluation. In 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015: Proceedings of the Conference (pp. 1403-1409). Red Hook, NY: Association for Computational Linguistics (ACL).