Abstract
Adaptor grammars (Johnson et al., 2007b) are a non-parametric Bayesian extension of Probabilistic Context-Free Grammars (PCFGs) which in effect learn the probabilities of entire subtrees. In practice, this means that an adaptor grammar learns the structures useful for generating the training data as well as their probabilities. We present several different adaptor grammars that learn to segment phonemic input into words by modeling different linguistic properties of the input. One of the advantages of a grammar-based framework is that it is easy to combine grammars, and we use this ability to compare models that capture different kinds of linguistic structure. We show that incorporating both unsupervised syllabification and collocation-finding into the adaptor grammar significantly improves unsupervised word-segmentation accuracy over that achieved by adaptor grammars that model only one of these linguistic phenomena.
Original language | English |
---|---|
Title of host publication | Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics |
Subtitle of host publication | Human Language Technologies |
Place of Publication | Stroudsburg, PA |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 398-406 |
Number of pages | 9 |
ISBN (Print) | 9781932432046 |
Publication status | Published - 2008 |
Externally published | Yes |
Event | 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-08: HLT - Columbus, OH, United States Duration: 15 Jun 2008 → 20 Jun 2008 |
Other
Other | 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-08: HLT |
---|---|
Country/Territory | United States |
City | Columbus, OH |
Period | 15/06/08 → 20/06/08 |