Improving disfluency detection by self-training a self-attentive model

Paria Jamshid Lou*, Mark Johnson

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review


Self-attentive neural syntactic parsers using contextualized word embeddings (e.g. ELMo or BERT) currently produce state-of-the-art results in joint parsing and disfluency detection in speech transcripts. Since the contextualized word embeddings are pre-trained on a large amount of unlabeled data, using additional unlabeled data to train a neural model might seem redundant. However, we show that self-training - a semi-supervised technique for incorporating unlabeled data - sets a new state-of-the-art for the self-attentive parser on disfluency detection, demonstrating that self-training provides benefits orthogonal to the pre-trained contextualized word representations. We also show that ensembling self-trained parsers provides further gains for disfluency detection.

Original languageEnglish
Title of host publicationThe 58th Annual Meeting of the Association for Computational Linguistics
Subtitle of host publicationProceedings of the Conference
Place of PublicationStroudsburg, PA
PublisherAssociation for Computational Linguistics (ACL)
Number of pages10
ISBN (Print)9781952148255
Publication statusPublished - 2020
Event58th Annual Meeting of the Association for Computational Linguistics (ACL) -
Duration: 5 Jul 202010 Jul 2020


Conference58th Annual Meeting of the Association for Computational Linguistics (ACL)

Bibliographical note

Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Cite this