Effective self-training for parsing

David McClosky*, Eugene Charniak, Mark Johnson

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution

254 Citations (Scopus)

Abstract

We present a simple, but surprisingly effective, method of self-training a two-phase parser-reranker system using readily available unlabeled data. We show that this type of bootstrapping is possible for parsing when the bootstrapped parses are processed by a discriminative reranker. Our improved model achieves an f-score of 92.1%, an absolute 1.1% improvement (12% error reduction) over the previous best result for Wall Street Journal parsing. Finally, we provide some analysis to better understand the phenomenon.

Original languageEnglish
Title of host publicationHLT-NAACL 2006 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings of the Main Conference
EditorsRobert C. Moore, Jeff A. Bilmes, Jennifer Chu-Carroll, Mark Sanderson
Place of PublicationEast Stroudsburg, PA
PublisherAssociation for Computational Linguistics (ACL)
Pages152-159
Number of pages8
ISBN (Electronic)9781932432626
DOIs
Publication statusPublished - 2006
Externally publishedYes
Event2006 Human Language Technology Conference - North American Chapter of the Association for Computational Linguistics Annual Meeting, HLT-NAACL 2006 - New York, NY, United States
Duration: 4 Jun 20069 Jun 2006

Other

Other2006 Human Language Technology Conference - North American Chapter of the Association for Computational Linguistics Annual Meeting, HLT-NAACL 2006
CountryUnited States
CityNew York, NY
Period4/06/069/06/06

Fingerprint Dive into the research topics of 'Effective self-training for parsing'. Together they form a unique fingerprint.

Cite this