Exploring adaptor grammars for native language identification

Sze Meng Jojo Wong, Mark Dras, Mark Johnson

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionResearchpeer-review

Abstract

The task of inferring the native language of an author based on texts written in a second language has generally been tackled as a classification problem, typically using as features a mix of n-grams over characters and part of speech tags (for small and fixed n) and un-igram function words. To capture arbitrarily long n-grams that syntax-based approaches have suggested are useful, adaptor grammars have some promise. In this work we investigate their extension to identifying n-gram collocations of arbitrary length over a mix of PoS tags and words, using both maxent and induced syntactic language model approaches to classification. After presenting a new, simple baseline, we show that learned collocations used as features in a maxent model perform better still, but that the story is more mixed for the syntactic language model.

LanguageEnglish
Title of host publicationEMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference
Place of PublicationStroudsburg, PA
PublisherAssociation for Computational Linguistics (ACL)
Pages699-709
Number of pages11
ISBN (Print)9781937284435
Publication statusPublished - 2012
Event2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012 - Jeju Island, Korea, Republic of
Duration: 12 Jul 201214 Jul 2012

Other

Other2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012
CountryKorea, Republic of
CityJeju Island
Period12/07/1214/07/12

Fingerprint

Syntactics

Cite this

Wong, S. M. J., Dras, M., & Johnson, M. (2012). Exploring adaptor grammars for native language identification. In EMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference (pp. 699-709). Stroudsburg, PA: Association for Computational Linguistics (ACL).
Wong, Sze Meng Jojo ; Dras, Mark ; Johnson, Mark. / Exploring adaptor grammars for native language identification. EMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference. Stroudsburg, PA : Association for Computational Linguistics (ACL), 2012. pp. 699-709
@inproceedings{7e3e434fe079414ea8699376b5cbcf69,
title = "Exploring adaptor grammars for native language identification",
abstract = "The task of inferring the native language of an author based on texts written in a second language has generally been tackled as a classification problem, typically using as features a mix of n-grams over characters and part of speech tags (for small and fixed n) and un-igram function words. To capture arbitrarily long n-grams that syntax-based approaches have suggested are useful, adaptor grammars have some promise. In this work we investigate their extension to identifying n-gram collocations of arbitrary length over a mix of PoS tags and words, using both maxent and induced syntactic language model approaches to classification. After presenting a new, simple baseline, we show that learned collocations used as features in a maxent model perform better still, but that the story is more mixed for the syntactic language model.",
author = "Wong, {Sze Meng Jojo} and Mark Dras and Mark Johnson",
year = "2012",
language = "English",
isbn = "9781937284435",
pages = "699--709",
booktitle = "EMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference",
publisher = "Association for Computational Linguistics (ACL)",

}

Wong, SMJ, Dras, M & Johnson, M 2012, Exploring adaptor grammars for native language identification. in EMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference. Association for Computational Linguistics (ACL), Stroudsburg, PA, pp. 699-709, 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, Jeju Island, Korea, Republic of, 12/07/12.

Exploring adaptor grammars for native language identification. / Wong, Sze Meng Jojo; Dras, Mark; Johnson, Mark.

EMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference. Stroudsburg, PA : Association for Computational Linguistics (ACL), 2012. p. 699-709.

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionResearchpeer-review

TY - GEN

T1 - Exploring adaptor grammars for native language identification

AU - Wong, Sze Meng Jojo

AU - Dras, Mark

AU - Johnson, Mark

PY - 2012

Y1 - 2012

N2 - The task of inferring the native language of an author based on texts written in a second language has generally been tackled as a classification problem, typically using as features a mix of n-grams over characters and part of speech tags (for small and fixed n) and un-igram function words. To capture arbitrarily long n-grams that syntax-based approaches have suggested are useful, adaptor grammars have some promise. In this work we investigate their extension to identifying n-gram collocations of arbitrary length over a mix of PoS tags and words, using both maxent and induced syntactic language model approaches to classification. After presenting a new, simple baseline, we show that learned collocations used as features in a maxent model perform better still, but that the story is more mixed for the syntactic language model.

AB - The task of inferring the native language of an author based on texts written in a second language has generally been tackled as a classification problem, typically using as features a mix of n-grams over characters and part of speech tags (for small and fixed n) and un-igram function words. To capture arbitrarily long n-grams that syntax-based approaches have suggested are useful, adaptor grammars have some promise. In this work we investigate their extension to identifying n-gram collocations of arbitrary length over a mix of PoS tags and words, using both maxent and induced syntactic language model approaches to classification. After presenting a new, simple baseline, we show that learned collocations used as features in a maxent model perform better still, but that the story is more mixed for the syntactic language model.

UR - http://www.scopus.com/inward/record.url?scp=84876809022&partnerID=8YFLogxK

M3 - Conference proceeding contribution

SN - 9781937284435

SP - 699

EP - 709

BT - EMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference

PB - Association for Computational Linguistics (ACL)

CY - Stroudsburg, PA

ER -

Wong SMJ, Dras M, Johnson M. Exploring adaptor grammars for native language identification. In EMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference. Stroudsburg, PA: Association for Computational Linguistics (ACL). 2012. p. 699-709