Abstract
Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process-the Pitman-Yor process-as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology.
Original language | English |
---|---|
Title of host publication | Advances in Neural Information Processing Systems 18 - Proceedings of the 2005 Conference |
Editors | Yair Weiss, Bernhard Schalkopf, John Platt |
Place of Publication | Cambridge, MA |
Publisher | MIT Press |
Pages | 459-466 |
Number of pages | 8 |
ISBN (Print) | 9780262232531 |
Publication status | Published - 2005 |
Externally published | Yes |
Event | 2005 Annual Conference on Neural Information Processing Systems, NIPS - 2005 - Vancouver, Canada Duration: 5 Dec 2005 → 8 Dec 2005 |
Other
Other | 2005 Annual Conference on Neural Information Processing Systems, NIPS - 2005 |
---|---|
Country | Canada |
City | Vancouver |
Period | 5/12/05 → 8/12/05 |