A comparative study of parameter estimation methods for statistical natural language processing

Jianfeng Gao*, Galen Andrew, Mark Johnson, Kristina Toutanova

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

42 Citations (Scopus)

Abstract

This paper presents a comparative study of five parameter estimation algorithms on four NLP tasks. Three of the five algorithms are well-known in the computational linguistics community: Maximum Entropy (ME) estimation with L 2 regularization, the Averaged Perceptron (AP), and Boosting. We also investigate ME estimation with L 1 regularization using a novel optimization algorithm, and BLasso, which is a version of Boosting with Lasso (L 1) regularization. We first investigate all of our estimators on two re-ranking tasks: a parse selection task and a language model (LM) adaptation task. Then we apply the best of these estimators to two additional tasks involving conditional sequence models: a Conditional Markov Model (CMM) for part of speech tagging and a Conditional Random Field (CRF) for Chinese word segmentation. Our experiments show that across tasks, three of the estimators - ME estimation with L 1 or L 2 regularization, and AP - are in a near statistical tie for first place.

Original languageEnglish
Title of host publicationACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics
Place of PublicationEast Stroudsburg, PA
PublisherAssociation for Computational Linguistics (ACL)
Pages824-831
Number of pages8
ISBN (Print)9781932432862
Publication statusPublished - 2007
Externally publishedYes
Event45th Annual Meeting of the Association for Computational Linguistics, ACL 2007 - Prague, Czech Republic
Duration: 23 Jun 200730 Jun 2007

Other

Other45th Annual Meeting of the Association for Computational Linguistics, ACL 2007
Country/TerritoryCzech Republic
CityPrague
Period23/06/0730/06/07

Fingerprint

Dive into the research topics of 'A comparative study of parameter estimation methods for statistical natural language processing'. Together they form a unique fingerprint.

Cite this