Unsupervised phonemic Chinese word segmentation using adaptor grammars

Mark Johnson*, Katherine Demuth

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

10 Citations (Scopus)

Abstract

Adaptor grammars are a framework for expressing and performing inference over a variety of non-parametric linguistic models. These models currently provide state-of-the-art performance on unsupervised word segmentation from phonemic representations of child-directed unsegmented English utterances. This paper investigates the applicability of these models to unsupervised word segmentation of Mandarin. We investigate a wide variety of different segmentation models, and show that the best segmentation accuracy is obtained from models that capture inter word "collocational" dependencies. Surprisingly, enhancing the models to exploit syllable structure regularities and to capture tone information does improve overall word segmentation accuracy, perhaps because the information these elements convey is redundant when compared to the inter-word dependencies.

Original languageEnglish
Title of host publicationColing 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference
EditorsChu-Ren Huang, Dan Jurafsky
Place of PublicationChina
PublisherPress of Tsinghua University
Pages528-536
Number of pages9
Volume2
Publication statusPublished - 2010
Event23rd International Conference on Computational Linguistics, Coling 2010 - Beijing, China
Duration: 23 Aug 201027 Aug 2010

Other

Other23rd International Conference on Computational Linguistics, Coling 2010
Country/TerritoryChina
CityBeijing
Period23/08/1027/08/10

Fingerprint

Dive into the research topics of 'Unsupervised phonemic Chinese word segmentation using adaptor grammars'. Together they form a unique fingerprint.

Cite this