Abstract
In this paper we present a fully unsupervised nonparametric Bayesian model that jointly induces POS tags and morphological segmentations. The model is essentially an infinite HMM that infers the number of states from data. Incorporating segmentation into the same model provides the morphological features to the system and eliminates the need to find them during preprocessing step. We show that learning both tasks jointly actually leads to better results than learning either task with gold standard data from the other task provided. The evaluation on multilingual data shows that the model produces state-of-the-art results on POS induction.
Original language | English |
---|---|
Title of host publication | NAACL HLT 2012 |
Subtitle of host publication | The Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies : Proceedings of the Conference |
Place of Publication | Stroudsburg, PA |
Publisher | Association for Computational Linguistics |
Pages | 407-416 |
Number of pages | 10 |
ISBN (Print) | 9781937284206 |
Publication status | Published - 2012 |
Externally published | Yes |
Event | Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies - Montreal, Canada Duration: 3 Jun 2012 → 8 Jun 2012 |
Conference
Conference | Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies |
---|---|
City | Montreal, Canada |
Period | 3/06/12 → 8/06/12 |