Arabic Native Language Identification

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution


In this paper we present the first application of Native Language Identification (NLI) to Arabic learner data. NLI, the task of predicting a writer's first language from their writing in other languages has been mostly investigated with English data, but is now expanding to other languages. We use L2 texts from the newly released Arabic Learner Corpus and with a combination of three syntactic features (CFG production rules, Arabic function words and Part-of-Speech n-grams), we demonstrate that they are useful for this task. Our system achieves an accuracy of 41% against a baseline of 23%, providing the first evidence for classifier-based detection of language transfer effects in L2 Arabic. Such methods can be useful for studying language transfer, developing teaching materials tailored to students' native language and forensic linguistics. Future directions are discussed.
Original languageEnglish
Title of host publicationEMNLP 2014
Subtitle of host publicationthe 2014 Conference on Empirical Methods In Natural Language Processing : proceedings of the conference
Place of PublicationStroudsburg, PA, USA
PublisherAssociation for Computational Linguistics
Number of pages7
ISBN (Print)9781937284961
Publication statusPublished - 2014
EventConference on Empirical Methods In Natural Language Processing (2014) - Doha, Qatar
Duration: 25 Oct 201429 Oct 2014


ConferenceConference on Empirical Methods In Natural Language Processing (2014)
CityDoha, Qatar

Fingerprint Dive into the research topics of 'Arabic Native Language Identification'. Together they form a unique fingerprint.

Cite this