Oracle and human baselines for native language identification

Shervin Malmasi, Joel Tetreault, Mark Dras

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

We examine different ensemble methods, including an oracle, to estimate the upper-limit of classification accuracy for Native Language Identification (NLI). The oracle outperforms state-of-the-art systems by over 10% and results indicate that for many misclassified texts the correct class label receives a significant portion of the ensemble votes, often being the runner-up. We also present a pilot study of human performance for NLI, the first such experiment. While some participants achieve modest results on our simplified setup with 5 L1s, they did not outperform our NLI system, and this performance gap is likely to widen on the standard NLI setup.
Original languageEnglish
Title of host publicationNAACL HLT 2015
Subtitle of host publicationThe Tenth Workshop on Innovative Use of NLP for Building Educational Applications : proceedings of the workshop
Place of PublicationRed Hook, New York
PublisherThe Association for Computational Linguistics
Pages172-178
Number of pages7
ISBN (Print)9781941643358
Publication statusPublished - 2015
EventWorkshop on Innovative Use of NLP for Building Educational Applications (10th : 2015) - Denver, CO
Duration: 4 Jun 20154 Jun 2015

Workshop

WorkshopWorkshop on Innovative Use of NLP for Building Educational Applications (10th : 2015)
CityDenver, CO
Period4/06/154/06/15

Fingerprint Dive into the research topics of 'Oracle and human baselines for native language identification'. Together they form a unique fingerprint.

Cite this