A joint model of word segmentation and phonological variation for English word-final/t/-deletion

Benjamin Börschinger, Mark Johnson, Katherine Demuth

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

3 Citations (Scopus)

Abstract

Word-final/t/-deletion refers to a common phenomenon in spoken English where words such as/wEst/"west" are pronounced as [wEs] "wes" in certain contexts. Phonological variation like this is common in naturally occurring speech. Current computational models of unsupervised word segmentation usually assume idealized input that is devoid of these kinds of variation. We extend a non-parametric model of word segmentation by adding phonological rules that map from underlying forms to surface forms to produce a mathematically well-defined joint model as a first step towards handling variation and segmentation in a single model. We analyse how our model handles/t/-deletion on a large corpus of transcribed speech, and show that the joint model can perform word segmentation and recover underlying/t/s. We find that Bigram dependencies are important for performing well on real data and for learning appropriate deletion probabilities for different contexts.

Original languageEnglish
Title of host publicationProceedings of the 51st Annual Meeting of the Association for Computational Linguistics
Subtitle of host publicationACL 2013: 4-9 August, Sofia, Bulgaria
Place of PublicationStroudsburg, PA
PublisherAssociation for Computational Linguistics (ACL)
Pages1508-1516
Number of pages9
Volume1
ISBN (Print)9781937284503
Publication statusPublished - 2013
Event51st Annual Meeting of the Association for Computational Linguistics, ACL 2013 - Sofia, Bulgaria
Duration: 4 Aug 20139 Aug 2013

Other

Other51st Annual Meeting of the Association for Computational Linguistics, ACL 2013
CountryBulgaria
CitySofia
Period4/08/139/08/13

Fingerprint

Dive into the research topics of 'A joint model of word segmentation and phonological variation for English word-final/t/-deletion'. Together they form a unique fingerprint.

Cite this