A Hierarchical dirichlet process model for joint part-of-speech and morphology induction

Kairit Sirts, Tanel Alumäe

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution

10 Citations (Scopus)

Abstract

In this paper we present a fully unsupervised nonparametric Bayesian model that jointly induces POS tags and morphological segmentations. The model is essentially an infinite HMM that infers the number of states from data. Incorporating segmentation into the same model provides the morphological features to the system and eliminates the need to find them during preprocessing step. We show that learning both tasks jointly actually leads to better results than learning either task with gold standard data from the other task provided. The evaluation on multilingual data shows that the model produces state-of-the-art results on POS induction.
Original languageEnglish
Title of host publicationNAACL HLT 2012
Subtitle of host publicationThe Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies : Proceedings of the Conference
Place of PublicationStroudsburg, PA
PublisherAssociation for Computational Linguistics
Pages407-416
Number of pages10
ISBN (Print)9781937284206
Publication statusPublished - 2012
Externally publishedYes
EventConference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies - Montreal, Canada
Duration: 3 Jun 20128 Jun 2012

Conference

ConferenceConference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies
CityMontreal, Canada
Period3/06/128/06/12

Cite this