Minimally-supervised morphological segmentation using adaptor grammars

Kairit Sirts, Sharon Goldwater

Research output: Contribution to journalArticle

Abstract

This paper explores the use of Adaptor Grammars, a nonparametric Bayesian modelling framework, for minimally supervised morphological segmentation. We compare three training methods: unsupervised training, semi-supervised training, and a novel model selection method. In the model selection method, we train unsupervised Adaptor Grammars using an over-articulated metagrammar, then use a small labelled data set to select which potential morph boundaries identified by the metagrammar should be returned in the final output. We evaluate on five languages and show that semi-supervised training provides a boost over unsupervised training, while the model selection method yields the best average results over all languages and is competitive with state-of-the-art semi-supervised systems. Moreover, this method provides the potential to tune performance according to different evaluation metrics or downstream tasks.
Original languageEnglish
Pages (from-to)255-266
Number of pages12
JournalTransactions of the Association for Computational Linguistics
Volume1
Publication statusPublished - 2013
Externally publishedYes

Cite this