Knowledge of accent differences can predict speech recognition errors

Tünde Szalay, Mostafa Shahin, Beena Ahmed, Kirrie Ballard

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

3 Citations (Scopus)

Abstract

If accent differences can predict the type of speech recognition errors, a smaller dataset systematically representing accent differences might be sufficient and less resource intensive for adapting an automatic speech recognition (ASR) to a novel variety compared to training the ASR on a large, unsystematic dataset. However, it is not known whether ASR errors pattern according to accent differences. Therefore, we tested the performance of Google's General American (GenAm) and Standard Australian English (SAusE) ASR on both dialects using words systematically representing accent differences. Accent differences were quantified using the different number of vowel phonemes, the different phonetic quality of vowels, and differences in rhoticity (i.e., presence/absence of postvocalic/ô/). Our results confirm that word recognition is significantly more accurate when ASR dialect matches the speaker dialect compared to the mismatched condition. Our results reveal that GenAm ASR is less accurate on SAusE speakers due to the higher number of vowel phonemes and the lack of postvocalic/ô/in SAusE. Thus, the data need of adapting ASR from GenAm to SAusE might be reduced by using a small dataset focusing on differences in the size of vowel inventory and in rhoticity.

Original languageEnglish
Title of host publicationInterspeech 2022
Subtitle of host publicationProceedings of the 23rd Annual Conference of the International Speech Communication Association
EditorsHanseok Ko, John H. L. Hansen
Place of PublicationBaixas, France
PublisherInternational Speech Communication Association (ISCA)
Pages1372-1376
Number of pages5
DOIs
Publication statusPublished - 2022
Externally publishedYes
EventAnnual Conference of the International Speech Communication Association (23rd : 2022) - Incheon, Korea, Republic of
Duration: 18 Sept 202222 Sept 2022

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)2308-457X

Conference

ConferenceAnnual Conference of the International Speech Communication Association (23rd : 2022)
Abbreviated titleINTERSPEECH 2022
Country/TerritoryKorea, Republic of
CityIncheon
Period18/09/2222/09/22

Keywords

  • accent differences
  • adapting ASR to novel varieties
  • automatic speech recognition

Fingerprint

Dive into the research topics of 'Knowledge of accent differences can predict speech recognition errors'. Together they form a unique fingerprint.

Cite this