Abstract
One of the reasons nonparametric Bayesian inference is attracting attention in computational linguistics is because it provides a principled way of learning the units of generalization together with their probabilities. Adaptor grammars are a framework for defining a variety of hierarchical nonparametric Bayesian models. This paper investigates some of the choices that arise in formulating adaptor grammars and associated inference procedures, and shows that they can have a dramatic impact on performance in an unsuper-vised word segmentation task. With appropriate adaptor grammars and inference procedures we achieve an 87% word token f-score on the standard Brent version of the Bernstein-Ratner corpus, which is an error reduction of over 35% over the best previously reported results for this corpus.
Original language | English |
---|---|
Title of host publication | NAACL '09 Proceedings of Human Language Technologies |
Subtitle of host publication | The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics |
Place of Publication | Stroudsburg, PA |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 317-325 |
Number of pages | 9 |
ISBN (Print) | 9781932432411 |
Publication status | Published - 2009 |
Externally published | Yes |
Event | Annual Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies, NAACL HLT (10th : 2009) - Boulder, United States Duration: 31 May 2009 → 5 Jun 2009 |
Other
Other | Annual Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies, NAACL HLT (10th : 2009) |
---|---|
Country/Territory | United States |
City | Boulder |
Period | 31/05/09 → 5/06/09 |