Unsupervised phonemic Chinese word segmentation using adaptor grammars

Mark Johnson*, Katherine Demuth

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution

5 Citations (Scopus)

Abstract

Adaptor grammars are a framework for expressing and performing inference over a variety of non-parametric linguistic models. These models currently provide state-of-the-art performance on unsupervised word segmentation from phonemic representations of child-directed unsegmented English utterances. This paper investigates the applicability of these models to unsupervised word segmentation of Mandarin. We investigate a wide variety of different segmentation models, and show that the best segmentation accuracy is obtained from models that capture inter word "collocational" dependencies. Surprisingly, enhancing the models to exploit syllable structure regularities and to capture tone information does improve overall word segmentation accuracy, perhaps because the information these elements convey is redundant when compared to the inter-word dependencies.

Original languageEnglish
Title of host publicationColing 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference
EditorsChu-Ren Huang, Dan Jurafsky
Place of PublicationChina
PublisherPress of Tsinghua University
Pages528-536
Number of pages9
Volume2
Publication statusPublished - 2010
Event23rd International Conference on Computational Linguistics, Coling 2010 - Beijing, China
Duration: 23 Aug 201027 Aug 2010

Other

Other23rd International Conference on Computational Linguistics, Coling 2010
CountryChina
CityBeijing
Period23/08/1027/08/10

Fingerprint Dive into the research topics of 'Unsupervised phonemic Chinese word segmentation using adaptor grammars'. Together they form a unique fingerprint.

  • Cite this

    Johnson, M., & Demuth, K. (2010). Unsupervised phonemic Chinese word segmentation using adaptor grammars. In C-R. Huang, & D. Jurafsky (Eds.), Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference (Vol. 2, pp. 528-536). China: Press of Tsinghua University.