Reranking and self-training for parser adaptation

David McClosky*, Eugene Charniak, Mark Johnson

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution

124 Citations (Scopus)

Abstract

Statistical parsers trained and tested on the Penn Wall Street Journal (WSJ) treebank have shown vast improvements over the last 10 years. Much of this improvement, however, is based upon an ever-increasing number of features to be trained on (typically) the WSJ treebank data. This has led to concern that such parsers may be too finely tuned to this corpus at the expense of portability to other genres. Such worries have merit. The standard "Charniak parser" checks in at a labeled precision-recall f-measure of 89.7% on the Penn WSJ test set, but only 82.9% on the test set from the Brown treebank corpus. This paper should allay these fears. In particular, we show that the reranking parser described in Charniak and Johnson (2005) improves performance of the parser on Brown to 85.2%. Furthermore, use of the self-training techniques described in (Mc-Closky et al., 2006) raise this to 87.8% (an error reduction of 28%) again without any use of labeled Brown data. This is remarkable since training the parser and reranker on labeled Brown data achieves only 88.4%.

Original languageEnglish
Title of host publicationCOLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
Place of PublicationStroudsburg, Pa
PublisherAssociation for Computational Linguistics (ACL)
Pages337-344
Number of pages8
Volume1
ISBN (Print)1932432655, 9781932432657
DOIs
Publication statusPublished - 2006
Externally publishedYes
Event21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, COLING/ACL - 2006 - Sydney, Australia
Duration: 17 Jul 200621 Jul 2006

Other

Other21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, COLING/ACL - 2006
CountryAustralia
CitySydney
Period17/07/0621/07/06

Fingerprint Dive into the research topics of 'Reranking and self-training for parser adaptation'. Together they form a unique fingerprint.

Cite this