Why doesn't EM find good HMM POS-taggers?

Mark Johnson*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

109 Citations (Scopus)

Abstract

This paper investigates why the HMMs estimated by Expectation-Maximization (EM) produce such poor results as Part-of-Speech (POS) taggers. We find that the HMMs estimated by EM generally assign a roughly equal number of word tokens to each hidden state, while the empirical distribution of tokens to POS tags is highly skewed. This motivates a Bayesian approach using a sparse prior to bias the estimator toward such a skewed distribution. We investigate Gibbs Sampling (GS) and Variational Bayes (VB) estimators and show that VB converges faster than GS for this task and that VB significantly improves 1-to-1 tagging accuracy over EM.We also show that EM does nearly as well as VB when the number of hidden HMM states is dramatically reduced. We also point out the high variance in all of these estimators, and that they require many more iterations to approach convergence than usually thought.

Original languageEnglish
Title of host publicationEMNLP-CoNLL 2007 - Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
EditorsFrank Dignum
Place of PublicationStroudsburg, PA
PublisherAssociation for Computational Linguistics (ACL)
Pages296-305
Number of pages10
Publication statusPublished - 2007
Externally publishedYes
Event2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2007 - Prague, Czech Republic
Duration: 28 Jun 200728 Jun 2007

Other

Other2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2007
CountryCzech Republic
CityPrague
Period28/06/0728/06/07

Fingerprint Dive into the research topics of 'Why doesn't EM find good HMM POS-taggers?'. Together they form a unique fingerprint.

Cite this