Whyisenglishsoeasytosegment?

Abdellah Fourtassi, Benjamin Börschinger, Mark Johnson, Emmanuel Dupoux

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution

Abstract

Cross-linguistic studies on unsupervised word segmentation have consistently shown that English is easier to segment than other languages. In this paper, we propose an explanation of this finding based on the notion of segmentation ambiguity. We show that English has a very low segmentation ambiguity compared to Japanese and that this difference correlates with the segmentation performance in a unigram model. We suggest that segmentation ambiguity is linked to a trade-off between syllable structure complexity and word length distribution.
Original languageEnglish
Title of host publicationProceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Subtitle of host publicationCMCL 2013 : August 8, 2013, Sofia, Bulgaria
Place of PublicationStroudsburg, PA
PublisherAssociation for Computational Linguistics
Pages1-10
Number of pages10
ISBN (Print)9781937284619
Publication statusPublished - 2013
EventAnnual Workshop on Cognitive Modeling and Computational Linguistics (4th : 2013) - Sofia, Bulgaria
Duration: 8 Aug 20138 Aug 2013

Workshop

WorkshopAnnual Workshop on Cognitive Modeling and Computational Linguistics (4th : 2013)
CitySofia, Bulgaria
Period8/08/138/08/13

Bibliographical note

Title on paper is 'Whyisenglishsoeasytosegment?'

Fingerprint Dive into the research topics of 'Whyisenglishsoeasytosegment?'. Together they form a unique fingerprint.

  • Cite this

    Fourtassi, A., Börschinger, B., Johnson, M., & Dupoux, E. (2013). Whyisenglishsoeasytosegment? In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics: CMCL 2013 : August 8, 2013, Sofia, Bulgaria (pp. 1-10). Stroudsburg, PA: Association for Computational Linguistics.