The impact of language models and loss functions on repair disfluency detection

Simon Zwarts, Mark Johnson

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionResearchpeer-review

Abstract

Unrehearsed spoken language often contains disfluencies. In order to correctly interpret a spoken utterance, any such disfluencies must be identified and removed or otherwise dealt with. Operating on transcripts of speech which contain disfluencies, we study the effect of language model and loss function on the performance of a linear reranker that rescores the 25-best output of a noisychannel model. We show that language models trained on large amounts of non-speech data improve performance more than a languagemodel trained on amoremodest amount of speech data, and that optimising f-score rather than log loss improves disfluency detection performance. Our approach uses a log-linear reranker, operating on the top n analyses of a noisy channel model. We use large language models, introduce new features into this reranker and examine different optimisation strategies. We obtain a disfluency detection f-scores of 0.838 which improves upon the current state-of-theart.

LanguageEnglish
Title of host publicationProceedings of the 49th Annual Meeting of the Association for Computational Linguistics, ACL-HLT 2011
Place of PublicationPortland, OR
PublisherAssociation for Computational Linguistics (ACL)
Pages703-711
Number of pages9
Volume1
ISBN (Print)9781932432879
Publication statusPublished - 2011
Event49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011 - Portland, OR, United States
Duration: 19 Jun 201124 Jun 2011

Other

Other49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011
CountryUnited States
CityPortland, OR
Period19/06/1124/06/11

Fingerprint

language
performance
spoken language
Repair
Disfluency
Language Model
Language Loss
Utterance
Spoken Language

Bibliographical note

Copyright the Publisher 2011. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Cite this

Zwarts, S., & Johnson, M. (2011). The impact of language models and loss functions on repair disfluency detection. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, ACL-HLT 2011 (Vol. 1, pp. 703-711). Portland, OR: Association for Computational Linguistics (ACL).
Zwarts, Simon ; Johnson, Mark. / The impact of language models and loss functions on repair disfluency detection. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, ACL-HLT 2011. Vol. 1 Portland, OR : Association for Computational Linguistics (ACL), 2011. pp. 703-711
@inproceedings{b60d154ad56c4b3dae173a2a1a8835cc,
title = "The impact of language models and loss functions on repair disfluency detection",
abstract = "Unrehearsed spoken language often contains disfluencies. In order to correctly interpret a spoken utterance, any such disfluencies must be identified and removed or otherwise dealt with. Operating on transcripts of speech which contain disfluencies, we study the effect of language model and loss function on the performance of a linear reranker that rescores the 25-best output of a noisychannel model. We show that language models trained on large amounts of non-speech data improve performance more than a languagemodel trained on amoremodest amount of speech data, and that optimising f-score rather than log loss improves disfluency detection performance. Our approach uses a log-linear reranker, operating on the top n analyses of a noisy channel model. We use large language models, introduce new features into this reranker and examine different optimisation strategies. We obtain a disfluency detection f-scores of 0.838 which improves upon the current state-of-theart.",
author = "Simon Zwarts and Mark Johnson",
note = "Copyright the Publisher 2011. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.",
year = "2011",
language = "English",
isbn = "9781932432879",
volume = "1",
pages = "703--711",
booktitle = "Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, ACL-HLT 2011",
publisher = "Association for Computational Linguistics (ACL)",

}

Zwarts, S & Johnson, M 2011, The impact of language models and loss functions on repair disfluency detection. in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, ACL-HLT 2011. vol. 1, Association for Computational Linguistics (ACL), Portland, OR, pp. 703-711, 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011, Portland, OR, United States, 19/06/11.

The impact of language models and loss functions on repair disfluency detection. / Zwarts, Simon; Johnson, Mark.

Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, ACL-HLT 2011. Vol. 1 Portland, OR : Association for Computational Linguistics (ACL), 2011. p. 703-711.

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionResearchpeer-review

TY - GEN

T1 - The impact of language models and loss functions on repair disfluency detection

AU - Zwarts, Simon

AU - Johnson, Mark

N1 - Copyright the Publisher 2011. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

PY - 2011

Y1 - 2011

N2 - Unrehearsed spoken language often contains disfluencies. In order to correctly interpret a spoken utterance, any such disfluencies must be identified and removed or otherwise dealt with. Operating on transcripts of speech which contain disfluencies, we study the effect of language model and loss function on the performance of a linear reranker that rescores the 25-best output of a noisychannel model. We show that language models trained on large amounts of non-speech data improve performance more than a languagemodel trained on amoremodest amount of speech data, and that optimising f-score rather than log loss improves disfluency detection performance. Our approach uses a log-linear reranker, operating on the top n analyses of a noisy channel model. We use large language models, introduce new features into this reranker and examine different optimisation strategies. We obtain a disfluency detection f-scores of 0.838 which improves upon the current state-of-theart.

AB - Unrehearsed spoken language often contains disfluencies. In order to correctly interpret a spoken utterance, any such disfluencies must be identified and removed or otherwise dealt with. Operating on transcripts of speech which contain disfluencies, we study the effect of language model and loss function on the performance of a linear reranker that rescores the 25-best output of a noisychannel model. We show that language models trained on large amounts of non-speech data improve performance more than a languagemodel trained on amoremodest amount of speech data, and that optimising f-score rather than log loss improves disfluency detection performance. Our approach uses a log-linear reranker, operating on the top n analyses of a noisy channel model. We use large language models, introduce new features into this reranker and examine different optimisation strategies. We obtain a disfluency detection f-scores of 0.838 which improves upon the current state-of-theart.

UR - http://www.scopus.com/inward/record.url?scp=84857820187&partnerID=8YFLogxK

M3 - Conference proceeding contribution

SN - 9781932432879

VL - 1

SP - 703

EP - 711

BT - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, ACL-HLT 2011

PB - Association for Computational Linguistics (ACL)

CY - Portland, OR

ER -

Zwarts S, Johnson M. The impact of language models and loss functions on repair disfluency detection. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, ACL-HLT 2011. Vol. 1. Portland, OR: Association for Computational Linguistics (ACL). 2011. p. 703-711