Abstract
Studies of computational models of language acquisition depend to a large part on the input available for experiments. In this paper, we study the effect that input size has on the performance of word segmentation models embodying different kinds of linguistic assumptions. Because currently available corpora for word segmentation are not suited for addressing this question, we perform our study on a novel corpus based on the Providence Corpus (Demuth et al., 2006). We find that input size can have dramatic effects on segmentation performance and that, somewhat surprisingly, models performing well on smaller amounts of data can show a marked decrease in performance when exposed to larger amounts of data. We also present the data-set on which we perform our experiments comprising longitudinal data for six children. This corpus makes it possible to ask more specific questions about computational models of word segmentation, in particular about intra-language variability and about how the performance of different models can change over time.
| Original language | English |
|---|---|
| Title of host publication | 24th International Conference on Computational Linguistics |
| Subtitle of host publication | Proceedings of COLING 2012: Technical Papers |
| Editors | Martin Kay, Christian Boitet |
| Place of Publication | Mumbai |
| Publisher | Indian Institute of Technology |
| Pages | 325-340 |
| Number of pages | 16 |
| Publication status | Published - 2012 |
| Event | 24th International Conference on Computational Linguistics, COLING 2012 - Mumbai, India Duration: 8 Dec 2012 → 15 Dec 2012 |
Other
| Other | 24th International Conference on Computational Linguistics, COLING 2012 |
|---|---|
| Country/Territory | India |
| City | Mumbai |
| Period | 8/12/12 → 15/12/12 |
Fingerprint
Dive into the research topics of 'Studying the effect of input size for Bayesian word segmentation on the providence corpus'. Together they form a unique fingerprint.Research output
- 4 Citations
- 2 Conference proceeding contribution
-
Improving combinatory categorial grammar parse reranking with dependency grammar features
Mac Kim, S., Ng, D., Johnson, M. & Curran, J. R., 2012, 24th International Conference on Computational Linguistics: Proceedings of COLING 2012: Technical Papers. Kay, M. & Boitet, C. (eds.). Mumbai: Indian Institute of Technology, p. 1441-1458 18 p.Research output: Chapter in Book/Report/Conference proceeding › Conference proceeding contribution › peer-review
-
Is bad structure better than no structure? unsupervised parsing for realisation ranking
Motazedi, Y., Dras, M. & Lareau, F., 2012, 24th International Conference on Computational Linguistics: Proceedings of COLING 2012: Technical Papers. Kay, M. & Boitet, C. (eds.). Mumbai: Indian Institute of Technology, p. 1811-1830 20 p.Research output: Chapter in Book/Report/Conference proceeding › Conference proceeding contribution › peer-review
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver