Bayesian inference for PCFGS via Markov Chain Monte Carlo

Mark Johnson*, Thomas L. Griffiths, Sharon Goldwater

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution

120 Citations (Scopus)

Abstract

This paper presents two Markov chain Monte Carlo (MCMC) algorithms for Bayesian inference of probabilistic context free grammars (PCFGs) from terminal strings, providing an alternative to maximum-likelihood estimation using the Inside-Outside algorithm. We illustrate these methods by estimating a sparse grammar describing the morphology of the Bantu language Sesotho, demonstrating that with suitable priors Bayesian techniques can infer linguistic structure in situations where maximum likelihood methods such as the Inside-Outside algorithm only produce a trivial grammar.

Original languageEnglish
Title of host publicationNAACL HLT 2007 - Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference
Place of PublicationStroudsburg, PA
PublisherAssociation for Computational Linguistics (ACL)
Pages139-146
Number of pages8
Publication statusPublished - 2007
Externally publishedYes
EventHuman Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT 2007 - Rochester, NY, United States
Duration: 22 Apr 200727 Apr 2007

Other

OtherHuman Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT 2007
CountryUnited States
CityRochester, NY
Period22/04/0727/04/07

Fingerprint Dive into the research topics of 'Bayesian inference for PCFGS via Markov Chain Monte Carlo'. Together they form a unique fingerprint.

Cite this