Abstract
A data-oriented parsing or DOP model for statistical parsing associates fragments of linguistic representations with numerical weights, where these weights are estimated by normalizing the empirical frequency of each fragment in a training corpus (see Bod [1998] and references cited therein). This note observes that this estimation method is biased and inconsistent; that is, the estimated distribution does not in general converge on the true distribution as the size of the training corpus increases.
| Original language | English |
|---|---|
| Pages (from-to) | 71-76 |
| Number of pages | 6 |
| Journal | Computational Linguistics |
| Volume | 28 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - Mar 2002 |
| Externally published | Yes |