A Bayesian LDA-based model for semi-supervised part-of-speech tagging

Kristina Toutanova*, Mark Johnson

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

34 Citations (Scopus)


We present a novel Bayesian model for semi-supervised part-of-speech tagging. Our model extends the Latent Dirichlet Allocation model and incorporates the intuition that words' distributions over tags, p(t|w), are sparse. In addition we introduce a model for determining the set of possible tags of a word which captures important dependencies in the ambiguity classes of words. Our model outperforms the best previously proposed model for this task on a standard dataset.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference
EditorsJohn C. Platt, Daphne Koller, Yoram Singer, Sam T. Roweis
Place of PublicationLa Jolla, California
Number of pages8
Publication statusPublished - 2009
Externally publishedYes
Event21st Annual Conference on Neural Information Processing Systems, NIPS 2007 - Vancouver, BC, Canada
Duration: 3 Dec 20076 Dec 2007


Other21st Annual Conference on Neural Information Processing Systems, NIPS 2007
CityVancouver, BC

Cite this