Abstract
Word-final/t/-deletion refers to a common phenomenon in spoken English where words such as/wEst/"west" are pronounced as [wEs] "wes" in certain contexts. Phonological variation like this is common in naturally occurring speech. Current computational models of unsupervised word segmentation usually assume idealized input that is devoid of these kinds of variation. We extend a non-parametric model of word segmentation by adding phonological rules that map from underlying forms to surface forms to produce a mathematically well-defined joint model as a first step towards handling variation and segmentation in a single model. We analyse how our model handles/t/-deletion on a large corpus of transcribed speech, and show that the joint model can perform word segmentation and recover underlying/t/s. We find that Bigram dependencies are important for performing well on real data and for learning appropriate deletion probabilities for different contexts.
Original language | English |
---|---|
Title of host publication | Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics |
Subtitle of host publication | ACL 2013: 4-9 August, Sofia, Bulgaria |
Place of Publication | Stroudsburg, PA |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 1508-1516 |
Number of pages | 9 |
Volume | 1 |
ISBN (Print) | 9781937284503 |
Publication status | Published - 2013 |
Event | 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013 - Sofia, Bulgaria Duration: 4 Aug 2013 → 9 Aug 2013 |
Other
Other | 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013 |
---|---|
Country/Territory | Bulgaria |
City | Sofia |
Period | 4/08/13 → 9/08/13 |