Abstract
We present a system for modeling disfluency in conversational speech: repairs, fillers, and self-interruption points (IPs). For each sentence, candidate repair analyses are generated by a stochastic tree adjoining grammar (TAG) noisy-channel model. A probabilistic syntactic language model scores the fluency of each analysis, and a maximum-entropy model selects the most likely analysis given the language model score and other features. Fillers are detected independently via a small set of deterministic rules, and IPs are detected by combining the output of repair and filler detection modules. In the recent Rich Transcription Fall 2004 (RT-04F) blind evaluation, systems competed to detect these three forms of disfluency under two input conditions: a best-case scenario of manually transcribed words and a fully automatic case of automatic speech recognition (ASR) output. For all three tasks and on both types of input, our system was the top performer in the evaluation.
Original language | English |
---|---|
Pages (from-to) | 1566-1573 |
Number of pages | 8 |
Journal | IEEE Transactions on Audio, Speech and Language Processing |
Volume | 14 |
Issue number | 5 |
DOIs | |
Publication status | Published - Sept 2006 |
Externally published | Yes |
Keywords
- Disfluency modeling
- Natural language processing
- Rich transcription
- Speech processing