Using adaptor grammars to identify synergies in the unsupervised acquisition of linguistic structure

Mark Johnson*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

55 Citations (Scopus)

Abstract

Adaptor grammars (Johnson et al., 2007b) are a non-parametric Bayesian extension of Probabilistic Context-Free Grammars (PCFGs) which in effect learn the probabilities of entire subtrees. In practice, this means that an adaptor grammar learns the structures useful for generating the training data as well as their probabilities. We present several different adaptor grammars that learn to segment phonemic input into words by modeling different linguistic properties of the input. One of the advantages of a grammar-based framework is that it is easy to combine grammars, and we use this ability to compare models that capture different kinds of linguistic structure. We show that incorporating both unsupervised syllabification and collocation-finding into the adaptor grammar significantly improves unsupervised word-segmentation accuracy over that achieved by adaptor grammars that model only one of these linguistic phenomena.

Original languageEnglish
Title of host publicationProceedings of the 46th Annual Meeting of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies
Place of PublicationStroudsburg, PA
PublisherAssociation for Computational Linguistics (ACL)
Pages398-406
Number of pages9
ISBN (Print)9781932432046
Publication statusPublished - 2008
Externally publishedYes
Event46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-08: HLT - Columbus, OH, United States
Duration: 15 Jun 200820 Jun 2008

Other

Other46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-08: HLT
Country/TerritoryUnited States
CityColumbus, OH
Period15/06/0820/06/08

Fingerprint

Dive into the research topics of 'Using adaptor grammars to identify synergies in the unsupervised acquisition of linguistic structure'. Together they form a unique fingerprint.

Cite this