When is self-training effective for parsing?

David McClosky*, Eugene Charniak, Mark Johnson

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

33 Citations (Scopus)


Self-training has been shown capable of improving on state-of-the-art parser performance (McClosky et al, 2006) despite the conventional wisdom on the matter and several studies to the contrary (Charniak, 1997; Steedman et al, 2003). However, it has remained unclear when and why self-training is helpful. In this paper, we test four hypotheses (namely, presence of a phase transition, impact of search errors, value of non-generative reranker features, and effects of unknown words). From these experiments, we gain a better understanding of why self-training works for parsing. Since improvements from self-training are correlated with unknown bigrams and biheads but not unknown words, the benefit of self-training appears most influenced by seeing known words in new combinations.

Original languageEnglish
Title of host publicationColing 2008 - 22nd International Conference on Computational Linguistics, Proceedings of the Conference
Place of PublicationManchester, UK
PublisherAssociation for Computational Linguistics (ACL)
Number of pages8
ISBN (Print)9781905593446
Publication statusPublished - 2008
Externally publishedYes
Event22nd International Conference on Computational Linguistics, Coling 2008 - Manchester, United Kingdom
Duration: 18 Aug 200822 Aug 2008


Other22nd International Conference on Computational Linguistics, Coling 2008
Country/TerritoryUnited Kingdom


Dive into the research topics of 'When is self-training effective for parsing?'. Together they form a unique fingerprint.

Cite this