Unsupervised word segmentation for Sesotho using Adaptor Grammars

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution

Abstract

This paper describes a variety of nonparametric Bayesian models of word segmentation based on Adaptor Grammars that model different aspects of the input and incorporate different kinds of prior knowledge, and applies them to the Bantu language Sesotho. While we find overall word segmentation accuracies lower than these models achieve on English, we also find some interesting differences in which factors contribute to better word segmentation. Specifically, we found little improvement to word segmentation accuracy when we modeled contextual dependencies, while modeling morphological structure did improve segmentation accuracy.
Original languageEnglish
Title of host publicationSIGMORPHONS 2008
Subtitle of host publicationproceedings of the tenth meeting of ACL Special Interest Group on Computational Morphology and Phonology
EditorsJason Eisner, Jeffrey Heinz
Place of PublicationMorristown, N.J.
PublisherAssociation for Computational Linguistics
Pages20-27
Number of pages8
ISBN (Print)9781932432121
Publication statusPublished - 2008
Externally publishedYes
EventMeeting of ACL Special Interest Group on Computational Morphology and Phonology (10th : 2008) - Columbus, Ohio, USA
Duration: 19 Jun 200819 Jun 2008

Conference

ConferenceMeeting of ACL Special Interest Group on Computational Morphology and Phonology (10th : 2008)
CityColumbus, Ohio, USA
Period19/06/0819/06/08

Cite this