Edit detection and parsing for transcribed speech

Eugene Charniak, Mark Johnson

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

126 Citations (Scopus)
22 Downloads (Pure)

Abstract

We present a simple architecture for parsing transcribed speech in which an edited-word detector first removes such words from the sentence string, and then a standard statistical parser trained on transcribed speech parses the remaining words. The edit detector achieves a misclassification rate on edited words of 2.2%. (The NULL-model, which marks everything as not edited, has an error rate of 5.9%.) To evaluate our parsing results we introduce a new evaluation metric, the purpose of which is to make evaluation of a parse tree relatively indifferent to the exact tree position of EDITED nodes. By this metric the parser achieves 85.3% precision and 86.5% recall.
Original languageEnglish
Title of host publicationProceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationNAACL 2001
Place of PublicationStroudsburg, PA
PublisherAssociation for Computational Linguistics (ACL)
Pages118-126
Number of pages9
DOIs
Publication statusPublished - 2001
Externally publishedYes
EventMeeting of the North American Chapter of the Association for Computational Linguistics (2nd : 2001) - Pittsburgh, United States
Duration: 1 Jun 20017 Jun 2001

Conference

ConferenceMeeting of the North American Chapter of the Association for Computational Linguistics (2nd : 2001)
Abbreviated titleNAACL '01
Country/TerritoryUnited States
CityPittsburgh
Period1/06/017/06/01

Bibliographical note

Copyright the Publisher 2001. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Fingerprint

Dive into the research topics of 'Edit detection and parsing for transcribed speech'. Together they form a unique fingerprint.

Cite this