Abstract
Self-training has been shown capable of improving on state-of-the-art parser performance (McClosky et al, 2006) despite the conventional wisdom on the matter and several studies to the contrary (Charniak, 1997; Steedman et al, 2003). However, it has remained unclear when and why self-training is helpful. In this paper, we test four hypotheses (namely, presence of a phase transition, impact of search errors, value of non-generative reranker features, and effects of unknown words). From these experiments, we gain a better understanding of why self-training works for parsing. Since improvements from self-training are correlated with unknown bigrams and biheads but not unknown words, the benefit of self-training appears most influenced by seeing known words in new combinations.
Original language | English |
---|---|
Title of host publication | Coling 2008 - 22nd International Conference on Computational Linguistics, Proceedings of the Conference |
Place of Publication | Manchester, UK |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 561-568 |
Number of pages | 8 |
Volume | 1 |
ISBN (Print) | 9781905593446 |
Publication status | Published - 2008 |
Externally published | Yes |
Event | 22nd International Conference on Computational Linguistics, Coling 2008 - Manchester, United Kingdom Duration: 18 Aug 2008 → 22 Aug 2008 |
Other
Other | 22nd International Conference on Computational Linguistics, Coling 2008 |
---|---|
Country/Territory | United Kingdom |
City | Manchester |
Period | 18/08/08 → 22/08/08 |