Projects per year
Abstract
Self-attentive neural syntactic parsers using contextualized word embeddings (e.g. ELMo or BERT) currently produce state-of-the-art results in joint parsing and disfluency detection in speech transcripts. Since the contextualized word embeddings are pre-trained on a large amount of unlabeled data, using additional unlabeled data to train a neural model might seem redundant. However, we show that self-training - a semi-supervised technique for incorporating unlabeled data - sets a new state-of-the-art for the self-attentive parser on disfluency detection, demonstrating that self-training provides benefits orthogonal to the pre-trained contextualized word representations. We also show that ensembling self-trained parsers provides further gains for disfluency detection.
Original language | English |
---|---|
Title of host publication | The 58th Annual Meeting of the Association for Computational Linguistics |
Subtitle of host publication | Proceedings of the Conference |
Place of Publication | Stroudsburg, PA |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 3754-3763 |
Number of pages | 10 |
ISBN (Print) | 9781952148255 |
Publication status | Published - 2020 |
Event | 58th Annual Meeting of the Association for Computational Linguistics (ACL) - Duration: 5 Jul 2020 → 10 Jul 2020 |
Conference
Conference | 58th Annual Meeting of the Association for Computational Linguistics (ACL) |
---|---|
Period | 5/07/20 → 10/07/20 |
Bibliographical note
Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.Fingerprint
Dive into the research topics of 'Improving disfluency detection by self-training a self-attentive model'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Improved syntactic and semantic analysis for natural language processing
Johnson, M. & Steedman, M.
30/06/16 → 31/12/21
Project: Research