Producing power-law distributions and damping word frequencies with two-stage language models

Sharon Goldwater, Thomas L. Griffiths, Mark Johnson

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statisticalmodels that can generically produce power laws, breaking generativemodels into two stages. The first stage, the generator, can be any standard probabilistic model, while the second stage, the adaptor, transforms the word frequencies of this model to provide a closer match to natural language. We show that two commonly used Bayesian models, the Dirichlet-multinomial model and the Dirichlet process, can be viewed as special cases of our framework. We discuss two stochastic processes-the Chinese restaurant process and its two-parameter generalization based on the Pitman-Yor process-that can be used as adaptors in our framework to produce power-law distributions over word frequencies. We show that these adaptors justify common estimation procedures based on logarithmic or inverse-power transformations of empirical frequencies. In addition, taking the Pitman-Yor Chinese restaurant process as an adaptor justifies the appearance of type frequencies in formal analyses of natural language and improves the performance of a model for unsupervised learning of morphology.

LanguageEnglish
Pages2335-2382
Number of pages48
JournalJournal of Machine Learning Research
Volume12
Publication statusPublished - Jul 2011

Fingerprint

Two-stage Model
Language Model
Power-law Distribution
Damping
Natural Language
Justify
Standard Model
Unsupervised learning
Power Transformation
Multinomial Model
Random processes
Dirichlet Process
Unsupervised Learning
Bayesian Model
Probabilistic Model
Statistical Model
Dirichlet
Two Parameters
Stochastic Processes
Logarithmic

Cite this

@article{1c6a0d85e58a48a6979adf766866c2c9,
title = "Producing power-law distributions and damping word frequencies with two-stage language models",
abstract = "Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statisticalmodels that can generically produce power laws, breaking generativemodels into two stages. The first stage, the generator, can be any standard probabilistic model, while the second stage, the adaptor, transforms the word frequencies of this model to provide a closer match to natural language. We show that two commonly used Bayesian models, the Dirichlet-multinomial model and the Dirichlet process, can be viewed as special cases of our framework. We discuss two stochastic processes-the Chinese restaurant process and its two-parameter generalization based on the Pitman-Yor process-that can be used as adaptors in our framework to produce power-law distributions over word frequencies. We show that these adaptors justify common estimation procedures based on logarithmic or inverse-power transformations of empirical frequencies. In addition, taking the Pitman-Yor Chinese restaurant process as an adaptor justifies the appearance of type frequencies in formal analyses of natural language and improves the performance of a model for unsupervised learning of morphology.",
author = "Sharon Goldwater and Griffiths, {Thomas L.} and Mark Johnson",
year = "2011",
month = "7",
language = "English",
volume = "12",
pages = "2335--2382",
journal = "Journal of Machine Learning Research",
issn = "1532-4435",
publisher = "Microtome Publishing",

}

Producing power-law distributions and damping word frequencies with two-stage language models. / Goldwater, Sharon; Griffiths, Thomas L.; Johnson, Mark.

In: Journal of Machine Learning Research, Vol. 12, 07.2011, p. 2335-2382.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Producing power-law distributions and damping word frequencies with two-stage language models

AU - Goldwater, Sharon

AU - Griffiths, Thomas L.

AU - Johnson, Mark

PY - 2011/7

Y1 - 2011/7

N2 - Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statisticalmodels that can generically produce power laws, breaking generativemodels into two stages. The first stage, the generator, can be any standard probabilistic model, while the second stage, the adaptor, transforms the word frequencies of this model to provide a closer match to natural language. We show that two commonly used Bayesian models, the Dirichlet-multinomial model and the Dirichlet process, can be viewed as special cases of our framework. We discuss two stochastic processes-the Chinese restaurant process and its two-parameter generalization based on the Pitman-Yor process-that can be used as adaptors in our framework to produce power-law distributions over word frequencies. We show that these adaptors justify common estimation procedures based on logarithmic or inverse-power transformations of empirical frequencies. In addition, taking the Pitman-Yor Chinese restaurant process as an adaptor justifies the appearance of type frequencies in formal analyses of natural language and improves the performance of a model for unsupervised learning of morphology.

AB - Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statisticalmodels that can generically produce power laws, breaking generativemodels into two stages. The first stage, the generator, can be any standard probabilistic model, while the second stage, the adaptor, transforms the word frequencies of this model to provide a closer match to natural language. We show that two commonly used Bayesian models, the Dirichlet-multinomial model and the Dirichlet process, can be viewed as special cases of our framework. We discuss two stochastic processes-the Chinese restaurant process and its two-parameter generalization based on the Pitman-Yor process-that can be used as adaptors in our framework to produce power-law distributions over word frequencies. We show that these adaptors justify common estimation procedures based on logarithmic or inverse-power transformations of empirical frequencies. In addition, taking the Pitman-Yor Chinese restaurant process as an adaptor justifies the appearance of type frequencies in formal analyses of natural language and improves the performance of a model for unsupervised learning of morphology.

UR - http://www.scopus.com/inward/record.url?scp=80052252167&partnerID=8YFLogxK

M3 - Article

VL - 12

SP - 2335

EP - 2382

JO - Journal of Machine Learning Research

T2 - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

ER -