Abstract
Probabilistic context-free grammars have the unusual property of not always defining tight distributions (i.e., the sum of the "probabilities" of the trees the grammar generates can be less than one). This paper reviews how this non-tightness can arise and discusses its impact on Bayesian estimation of PCFGs. We begin by presenting the notion of "almost everywhere tight grammars" and show that linear CFGs follow it. We then propose three different ways of reinterpreting non-tight PCFGs to make them tight, show that the Bayesian estimators in Johnson et al. (2007) are correct under one of them, and provide MCMC samplers for the other two. We conclude with a discussion of the impact of tightness empirically.
Original language | English |
---|---|
Title of host publication | Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics |
Subtitle of host publication | ACL 2013 : 4-9 August, Sofia, Bulgaria |
Place of Publication | Stroudsburg, PA |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 1033-1041 |
Number of pages | 9 |
Volume | 1 |
ISBN (Print) | 9781937284503 |
Publication status | Published - 2013 |
Event | 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013 - Sofia, Bulgaria Duration: 4 Aug 2013 → 9 Aug 2013 |
Other
Other | 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013 |
---|---|
Country/Territory | Bulgaria |
City | Sofia |
Period | 4/08/13 → 9/08/13 |