Sign constraints on feature weights improve a joint model of word segmentation and phonology

Mark Johnson, Joe Pater, Robert Staubs, Emmanuel Dupoux

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

6 Citations (Scopus)
26 Downloads (Pure)

Abstract

This paper describes a joint model of word segmentation and phonological alternations, which takes unsegmented utterances as input and infers word segmentations and underlying phonological representations. The model is a Maximum Entropy or log-linear model, which can express a probabilistic version of Optimality Theory (OT; Prince and Smolensky (2004)), a standard phonological framework. The features in our model are inspired by OT's Markedness and Faithfulness constraints. Following the OT principle that such features indicate "violations", we require their weights to be non-positive. We apply our model to a modified version of the Buckeye corpus (Pitt et al., 2007) in which the only phonological alternations are deletions of word-final /d/and /t/segments. The model sets a new state-ofthe-art for this corpus for word segmentation, identification of underlying forms, and identification of /d/and /t/deletions. We also show that the OT-inspired sign constraints on feature weights are crucial for accurate identification of deleted /d/s; without them our model posits approximately 10 times more deleted underlying /d/s than appear in the manually annotated data.
Original languageEnglish
Title of host publicationNAACL HLT 2015
Subtitle of host publicationthe 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies : proceedings
Place of PublicationRed Hook, New York
PublisherAssociation for Computational Linguistics (ACL)
Pages303-313
Number of pages11
ISBN (Print)9781941643495
DOIs
Publication statusPublished - 2015
EventConference of the North American Chapter of the Association for Computational Linguistics : human language technologies - Denver, CO
Duration: 31 May 20155 Jun 2015

Conference

ConferenceConference of the North American Chapter of the Association for Computational Linguistics : human language technologies
CityDenver, CO
Period31/05/155/06/15

Bibliographical note

Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Fingerprint

Dive into the research topics of 'Sign constraints on feature weights improve a joint model of word segmentation and phonology'. Together they form a unique fingerprint.

Cite this