Abstract
One of the reasons nonparametric Bayesian inference is attracting attention in computational linguistics is because it provides a principled way of learning the units of generalization together with their probabilities. Adaptor grammars are a framework for defining a variety of hierarchical nonparametric Bayesian models. This paper investigates some of the choices that arise in formulating adaptor grammars and associated inference procedures, and shows that they can have a dramatic impact on performance in an unsuper-vised word segmentation task. With appropriate adaptor grammars and inference procedures we achieve an 87% word token f-score on the standard Brent version of the Bernstein-Ratner corpus, which is an error reduction of over 35% over the best previously reported results for this corpus.
| Original language | English |
|---|---|
| Title of host publication | NAACL '09 Proceedings of Human Language Technologies |
| Subtitle of host publication | The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics |
| Place of Publication | Stroudsburg, PA |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 317-325 |
| Number of pages | 9 |
| ISBN (Print) | 9781932432411 |
| Publication status | Published - 2009 |
| Externally published | Yes |
| Event | Annual Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies, NAACL HLT (10th : 2009) - Boulder, United States Duration: 31 May 2009 → 5 Jun 2009 |
Other
| Other | Annual Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies, NAACL HLT (10th : 2009) |
|---|---|
| Country/Territory | United States |
| City | Boulder |
| Period | 31/05/09 → 5/06/09 |