Abstract
We present a system for modeling disfluency in conversational speech: repairs, fillers, and self-interruption points (IPs). For each sentence, candidate repair analyses are generated by a stochastic tree adjoining grammar (TAG) noisy-channel model. A probabilistic syntactic language model scores the fluency of each analysis, and a maximum-entropy model selects the most likely analysis given the language model score and other features. Fillers are detected independently via a small set of deterministic rules, and IPs are detected by combining the output of repair and filler detection modules. In the recent Rich Transcription Fall 2004 (RT-04F) blind evaluation, systems competed to detect these three forms of disfluency under two input conditions: a best-case scenario of manually transcribed words and a fully automatic case of automatic speech recognition (ASR) output. For all three tasks and on both types of input, our system was the top performer in the evaluation.
| Original language | English |
|---|---|
| Pages (from-to) | 1566-1573 |
| Number of pages | 8 |
| Journal | IEEE Transactions on Audio, Speech and Language Processing |
| Volume | 14 |
| Issue number | 5 |
| DOIs | |
| Publication status | Published - Sept 2006 |
| Externally published | Yes |
Keywords
- Disfluency modeling
- Natural language processing
- Rich transcription
- Speech processing
Fingerprint
Dive into the research topics of 'Recognizing disfluencies in conversational speech'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver