Recognizing disfluencies in conversational speech

Matthew Lease*, Mark Johnson, Eugene Charniak

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

36 Citations (Scopus)


We present a system for modeling disfluency in conversational speech: repairs, fillers, and self-interruption points (IPs). For each sentence, candidate repair analyses are generated by a stochastic tree adjoining grammar (TAG) noisy-channel model. A probabilistic syntactic language model scores the fluency of each analysis, and a maximum-entropy model selects the most likely analysis given the language model score and other features. Fillers are detected independently via a small set of deterministic rules, and IPs are detected by combining the output of repair and filler detection modules. In the recent Rich Transcription Fall 2004 (RT-04F) blind evaluation, systems competed to detect these three forms of disfluency under two input conditions: a best-case scenario of manually transcribed words and a fully automatic case of automatic speech recognition (ASR) output. For all three tasks and on both types of input, our system was the top performer in the evaluation.

Original languageEnglish
Pages (from-to)1566-1573
Number of pages8
JournalIEEE Transactions on Audio, Speech and Language Processing
Issue number5
Publication statusPublished - Sep 2006
Externally publishedYes


  • Disfluency modeling
  • Natural language processing
  • Rich transcription
  • Speech processing


Dive into the research topics of 'Recognizing disfluencies in conversational speech'. Together they form a unique fingerprint.

Cite this