Interpolating between types and tokens by estimating power-law generators

Sharon Goldwater*, Thomas L. Griffiths, Mark Johnson

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution

67 Citations (Scopus)

Abstract

Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process-the Pitman-Yor process-as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 18 - Proceedings of the 2005 Conference
EditorsYair Weiss, Bernhard Schalkopf, John Platt
Place of PublicationCambridge, MA
PublisherMIT Press
Pages459-466
Number of pages8
ISBN (Print)9780262232531
Publication statusPublished - 2005
Externally publishedYes
Event2005 Annual Conference on Neural Information Processing Systems, NIPS - 2005 - Vancouver, Canada
Duration: 5 Dec 20058 Dec 2005

Other

Other2005 Annual Conference on Neural Information Processing Systems, NIPS - 2005
CountryCanada
CityVancouver
Period5/12/058/12/05

Fingerprint Dive into the research topics of 'Interpolating between types and tokens by estimating power-law generators'. Together they form a unique fingerprint.

Cite this