Improving nonparameteric Bayesian inference: experiments on unsupervised word segmentation with adaptor grammars

Mark Johnson*, Sharon Goldwater

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

77 Citations (Scopus)

Abstract

One of the reasons nonparametric Bayesian inference is attracting attention in computational linguistics is because it provides a principled way of learning the units of generalization together with their probabilities. Adaptor grammars are a framework for defining a variety of hierarchical nonparametric Bayesian models. This paper investigates some of the choices that arise in formulating adaptor grammars and associated inference procedures, and shows that they can have a dramatic impact on performance in an unsuper-vised word segmentation task. With appropriate adaptor grammars and inference procedures we achieve an 87% word token f-score on the standard Brent version of the Bernstein-Ratner corpus, which is an error reduction of over 35% over the best previously reported results for this corpus.

Original languageEnglish
Title of host publicationNAACL '09 Proceedings of Human Language Technologies
Subtitle of host publicationThe 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Place of PublicationStroudsburg, PA
PublisherAssociation for Computational Linguistics (ACL)
Pages317-325
Number of pages9
ISBN (Print)9781932432411
Publication statusPublished - 2009
Externally publishedYes
EventAnnual Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies, NAACL HLT (10th : 2009) - Boulder, United States
Duration: 31 May 20095 Jun 2009

Other

OtherAnnual Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies, NAACL HLT (10th : 2009)
CountryUnited States
CityBoulder
Period31/05/095/06/09

Fingerprint

Dive into the research topics of 'Improving nonparameteric Bayesian inference: experiments on unsupervised word segmentation with adaptor grammars'. Together they form a unique fingerprint.

Cite this