Large-scale Native Language Identification with cross-corpus evaluation

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionResearchpeer-review

Abstract

We present a large-scale Native Language Identification (NLI) experiment on new data, with a focus on cross-corpus evaluation to identify corpus- and genre-independent language transfer features. We test a new corpus and show it is comparable to other NLI corpora and suitable for this task. Cross-corpus evaluation on two large corpora achieves good accuracy and evidences the existence of reliable language transfer features, but lower performance also suggests that NLI models are not completely portable across corpora. Finally, we present a brief case study of features distinguishing Japanese learners' English writing, demonstrating the presence of cross-corpus and cross-genre language transfer features that are highly applicable to SLA and ESL research.

LanguageEnglish
Title of host publication2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015
Subtitle of host publicationProceedings of the Conference
Place of PublicationRed Hook, NY
PublisherAssociation for Computational Linguistics (ACL)
Pages1403-1409
Number of pages7
ISBN (Electronic)9781941643495
Publication statusPublished - 2015
EventConference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015 - Denver, United States
Duration: 31 May 20155 Jun 2015

Other

OtherConference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015
CountryUnited States
CityDenver
Period31/05/155/06/15

Fingerprint

language
evaluation
Experiments
genre
Evaluation
Native Language
experiment
performance
evidence
Language Transfer

Cite this

Malmasi, S., & Dras, M. (2015). Large-scale Native Language Identification with cross-corpus evaluation. In 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015: Proceedings of the Conference (pp. 1403-1409). Red Hook, NY: Association for Computational Linguistics (ACL).
Malmasi, Shervin ; Dras, Mark. / Large-scale Native Language Identification with cross-corpus evaluation. 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015: Proceedings of the Conference. Red Hook, NY : Association for Computational Linguistics (ACL), 2015. pp. 1403-1409
@inproceedings{fd5514ec0d1e4b4a9a4a6f62ddc7c165,
title = "Large-scale Native Language Identification with cross-corpus evaluation",
abstract = "We present a large-scale Native Language Identification (NLI) experiment on new data, with a focus on cross-corpus evaluation to identify corpus- and genre-independent language transfer features. We test a new corpus and show it is comparable to other NLI corpora and suitable for this task. Cross-corpus evaluation on two large corpora achieves good accuracy and evidences the existence of reliable language transfer features, but lower performance also suggests that NLI models are not completely portable across corpora. Finally, we present a brief case study of features distinguishing Japanese learners' English writing, demonstrating the presence of cross-corpus and cross-genre language transfer features that are highly applicable to SLA and ESL research.",
author = "Shervin Malmasi and Mark Dras",
year = "2015",
language = "English",
pages = "1403--1409",
booktitle = "2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015",
publisher = "Association for Computational Linguistics (ACL)",

}

Malmasi, S & Dras, M 2015, Large-scale Native Language Identification with cross-corpus evaluation. in 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015: Proceedings of the Conference. Association for Computational Linguistics (ACL), Red Hook, NY, pp. 1403-1409, Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015, Denver, United States, 31/05/15.

Large-scale Native Language Identification with cross-corpus evaluation. / Malmasi, Shervin; Dras, Mark.

2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015: Proceedings of the Conference. Red Hook, NY : Association for Computational Linguistics (ACL), 2015. p. 1403-1409.

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionResearchpeer-review

TY - GEN

T1 - Large-scale Native Language Identification with cross-corpus evaluation

AU - Malmasi, Shervin

AU - Dras, Mark

PY - 2015

Y1 - 2015

N2 - We present a large-scale Native Language Identification (NLI) experiment on new data, with a focus on cross-corpus evaluation to identify corpus- and genre-independent language transfer features. We test a new corpus and show it is comparable to other NLI corpora and suitable for this task. Cross-corpus evaluation on two large corpora achieves good accuracy and evidences the existence of reliable language transfer features, but lower performance also suggests that NLI models are not completely portable across corpora. Finally, we present a brief case study of features distinguishing Japanese learners' English writing, demonstrating the presence of cross-corpus and cross-genre language transfer features that are highly applicable to SLA and ESL research.

AB - We present a large-scale Native Language Identification (NLI) experiment on new data, with a focus on cross-corpus evaluation to identify corpus- and genre-independent language transfer features. We test a new corpus and show it is comparable to other NLI corpora and suitable for this task. Cross-corpus evaluation on two large corpora achieves good accuracy and evidences the existence of reliable language transfer features, but lower performance also suggests that NLI models are not completely portable across corpora. Finally, we present a brief case study of features distinguishing Japanese learners' English writing, demonstrating the presence of cross-corpus and cross-genre language transfer features that are highly applicable to SLA and ESL research.

UR - http://www.scopus.com/inward/record.url?scp=84960157803&partnerID=8YFLogxK

M3 - Conference proceeding contribution

SP - 1403

EP - 1409

BT - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015

PB - Association for Computational Linguistics (ACL)

CY - Red Hook, NY

ER -

Malmasi S, Dras M. Large-scale Native Language Identification with cross-corpus evaluation. In 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015: Proceedings of the Conference. Red Hook, NY: Association for Computational Linguistics (ACL). 2015. p. 1403-1409