Abstract
A data-oriented parsing or DOP model for statistical parsing associates fragments of linguistic representations with numerical weights, where these weights are estimated by normalizing the empirical frequency of each fragment in a training corpus (see Bod [1998] and references cited therein). This note observes that this estimation method is biased and inconsistent; that is, the estimated distribution does not in general converge on the true distribution as the size of the training corpus increases.
Original language | English |
---|---|
Pages (from-to) | 71-76 |
Number of pages | 6 |
Journal | Computational Linguistics |
Volume | 28 |
Issue number | 1 |
DOIs | |
Publication status | Published - Mar 2002 |
Externally published | Yes |