Exploring adaptor grammars for native language identification

Sze Meng Jojo Wong, Mark Dras, Mark Johnson

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

35 Citations (Scopus)

Abstract

The task of inferring the native language of an author based on texts written in a second language has generally been tackled as a classification problem, typically using as features a mix of n-grams over characters and part of speech tags (for small and fixed n) and un-igram function words. To capture arbitrarily long n-grams that syntax-based approaches have suggested are useful, adaptor grammars have some promise. In this work we investigate their extension to identifying n-gram collocations of arbitrary length over a mix of PoS tags and words, using both maxent and induced syntactic language model approaches to classification. After presenting a new, simple baseline, we show that learned collocations used as features in a maxent model perform better still, but that the story is more mixed for the syntactic language model.

Original languageEnglish
Title of host publicationEMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference
Place of PublicationStroudsburg, PA
PublisherAssociation for Computational Linguistics (ACL)
Pages699-709
Number of pages11
ISBN (Print)9781937284435
Publication statusPublished - 2012
Event2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012 - Jeju Island, Korea, Republic of
Duration: 12 Jul 201214 Jul 2012

Other

Other2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012
Country/TerritoryKorea, Republic of
CityJeju Island
Period12/07/1214/07/12

Fingerprint

Dive into the research topics of 'Exploring adaptor grammars for native language identification'. Together they form a unique fingerprint.

Cite this