Research output per year
Research output per year
Benjamin B̈orschinger*, Katherine Demuth, Mark Johnson
Research output: Chapter in Book/Report/Conference proceeding › Conference proceeding contribution › peer-review
Studies of computational models of language acquisition depend to a large part on the input available for experiments. In this paper, we study the effect that input size has on the performance of word segmentation models embodying different kinds of linguistic assumptions. Because currently available corpora for word segmentation are not suited for addressing this question, we perform our study on a novel corpus based on the Providence Corpus (Demuth et al., 2006). We find that input size can have dramatic effects on segmentation performance and that, somewhat surprisingly, models performing well on smaller amounts of data can show a marked decrease in performance when exposed to larger amounts of data. We also present the data-set on which we perform our experiments comprising longitudinal data for six children. This corpus makes it possible to ask more specific questions about computational models of word segmentation, in particular about intra-language variability and about how the performance of different models can change over time.
Original language | English |
---|---|
Title of host publication | 24th International Conference on Computational Linguistics |
Subtitle of host publication | Proceedings of COLING 2012: Technical Papers |
Editors | Martin Kay, Christian Boitet |
Place of Publication | Mumbai |
Publisher | Indian Institute of Technology |
Pages | 325-340 |
Number of pages | 16 |
Publication status | Published - 2012 |
Event | 24th International Conference on Computational Linguistics, COLING 2012 - Mumbai, India Duration: 8 Dec 2012 → 15 Dec 2012 |
Other | 24th International Conference on Computational Linguistics, COLING 2012 |
---|---|
Country/Territory | India |
City | Mumbai |
Period | 8/12/12 → 15/12/12 |
Research output: Chapter in Book/Report/Conference proceeding › Conference proceeding contribution › peer-review
Research output: Chapter in Book/Report/Conference proceeding › Conference proceeding contribution › peer-review