Abstract
The problem of estimating an author’s vocabulary, given a sample of the author’s writings, is considered. It is assumed that the vocabulary is fixed and finite, and that the author writes a composition by successively drawing words from this collection, independently of the previous configuration. Attention is focussed on the random variable X(n), the total number of different words used in a sample of n. It is shown that under fairly general conditions, the distribution of X(n), suitably normalized and scaled, is asymptotically Gaussian, and this result may be used to obtain a large sample estimator of vocabulary size.
| Original language | English |
|---|---|
| Pages (from-to) | 92-96 |
| Number of pages | 5 |
| Journal | Journal of the American Statistical Association |
| Volume | 68 |
| Issue number | 341 |
| DOIs | |
| Publication status | Published - 1973 |
| Externally published | Yes |
Fingerprint
Dive into the research topics of 'Estimating an author’s vocabulary'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver