An improved model for recognizing disfluencies in conversational speech

Mark Johnson, Eugene Charniak, Matthew Lease

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review


This paper presents a novel metadata extraction (MDE) system for automatically detecting edited words, fillers, and self-interruption points in conversational speech. Our edit word detection sub-system combines a Tree Adjoining Grammar (TAG) noisy channel model, a statistical syntactic language model, and a MaxEnt reranker. Hand-built, deterministic rules are used to detect fillers. Self-interruption points are explicitly determined by detected fillers and edited words. We have evaluated our system for these three tasks on two types of input: manually annotated words and automatically recognized speech-to-text tokens. In all six cases, our system has improved the state-of-the-art, as measured in a recent blind evaluation.
Original languageEnglish
Title of host publicationRich Transcription 2004 Fall workshop (RT-04F)
Place of PublicationPalisades, NY
Number of pages7
Publication statusPublished - 2004
Externally publishedYes
EventRich Transcription 2004 Fall workshop - Palisades, United States
Duration: 7 Nov 200410 Nov 2004


ConferenceRich Transcription 2004 Fall workshop
Abbreviated titleRT-04F
Country/TerritoryUnited States


Dive into the research topics of 'An improved model for recognizing disfluencies in conversational speech'. Together they form a unique fingerprint.

Cite this