Abstract
This paper describes a variety of nonparametric Bayesian models of word segmentation
based on Adaptor Grammars that model different aspects of the input and incorporate
different kinds of prior knowledge, and applies them to the Bantu language Sesotho.
While we find overall word segmentation accuracies lower than these models achieve on
English, we also find some interesting differences in which factors contribute to better word segmentation. Specifically, we found little improvement to word segmentation accuracy when we modeled contextual dependencies, while modeling morphological structure did improve segmentation accuracy.
Original language | English |
---|---|
Title of host publication | SIGMORPHONS 2008 |
Subtitle of host publication | proceedings of the tenth meeting of ACL Special Interest Group on Computational Morphology and Phonology |
Editors | Jason Eisner, Jeffrey Heinz |
Place of Publication | Morristown, N.J. |
Publisher | Association for Computational Linguistics |
Pages | 20-27 |
Number of pages | 8 |
ISBN (Print) | 9781932432121 |
Publication status | Published - 2008 |
Externally published | Yes |
Event | Meeting of ACL Special Interest Group on Computational Morphology and Phonology (10th : 2008) - Columbus, Ohio, USA Duration: 19 Jun 2008 → 19 Jun 2008 |
Conference
Conference | Meeting of ACL Special Interest Group on Computational Morphology and Phonology (10th : 2008) |
---|---|
City | Columbus, Ohio, USA |
Period | 19/06/08 → 19/06/08 |