Projects per year
Abstract
Disfluency detection is usually an intermediate step between an automatic speech recognition (ASR) system and a downstream task. By contrast, this paper aims to investigate the task of end-to-end speech recognition and disfluency removal. We specifically explore whether it is possible to train an ASR model to directly map disfluent speech into fluent transcripts, without relying on a separate disfluency detection model. We show that end-to-end models do learn to directly generate fluent transcripts; however, their performance is slightly worse than a baseline pipeline approach consisting of an ASR system and a specialized disfluency detection model. We also propose two new metrics for evaluating integrated ASR and disfluency removal models. The findings of this paper can serve as a benchmark for further research on the task of end-to-end speech recognition and disfluency removal in the future.
Original language | English |
---|---|
Title of host publication | Findings of the Association for Computational Linguistics |
Subtitle of host publication | Findings of ACL EMNLP 2020 |
Place of Publication | Stroudsburg, PA |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 2051-2061 |
Number of pages | 11 |
ISBN (Electronic) | 9781952148903 |
Publication status | Published - 2020 |
Event | Findings of the Association for Computational Linguistics, ACL 2020: EMNLP 2020 - Virtual, Online Duration: 16 Nov 2020 → 20 Nov 2020 |
Publication series
Name | Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020 |
---|
Conference
Conference | Findings of the Association for Computational Linguistics, ACL 2020: EMNLP 2020 |
---|---|
City | Virtual, Online |
Period | 16/11/20 → 20/11/20 |
Bibliographical note
Copyright the Publisher 2021. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.Fingerprint
Dive into the research topics of 'End-to-end speech recognition and disfluency removal'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Improved syntactic and semantic analysis for natural language processing
Johnson, M. & Steedman, M.
30/06/16 → 31/12/21
Project: Research