TY - GEN
T1 - Knowledge of accent differences can predict speech recognition errors
AU - Szalay, Tünde
AU - Shahin, Mostafa
AU - Ahmed, Beena
AU - Ballard, Kirrie
PY - 2022
Y1 - 2022
N2 - If accent differences can predict the type of speech recognition errors, a smaller dataset systematically representing accent differences might be sufficient and less resource intensive for adapting an automatic speech recognition (ASR) to a novel variety compared to training the ASR on a large, unsystematic dataset. However, it is not known whether ASR errors pattern according to accent differences. Therefore, we tested the performance of Google's General American (GenAm) and Standard Australian English (SAusE) ASR on both dialects using words systematically representing accent differences. Accent differences were quantified using the different number of vowel phonemes, the different phonetic quality of vowels, and differences in rhoticity (i.e., presence/absence of postvocalic/ô/). Our results confirm that word recognition is significantly more accurate when ASR dialect matches the speaker dialect compared to the mismatched condition. Our results reveal that GenAm ASR is less accurate on SAusE speakers due to the higher number of vowel phonemes and the lack of postvocalic/ô/in SAusE. Thus, the data need of adapting ASR from GenAm to SAusE might be reduced by using a small dataset focusing on differences in the size of vowel inventory and in rhoticity.
AB - If accent differences can predict the type of speech recognition errors, a smaller dataset systematically representing accent differences might be sufficient and less resource intensive for adapting an automatic speech recognition (ASR) to a novel variety compared to training the ASR on a large, unsystematic dataset. However, it is not known whether ASR errors pattern according to accent differences. Therefore, we tested the performance of Google's General American (GenAm) and Standard Australian English (SAusE) ASR on both dialects using words systematically representing accent differences. Accent differences were quantified using the different number of vowel phonemes, the different phonetic quality of vowels, and differences in rhoticity (i.e., presence/absence of postvocalic/ô/). Our results confirm that word recognition is significantly more accurate when ASR dialect matches the speaker dialect compared to the mismatched condition. Our results reveal that GenAm ASR is less accurate on SAusE speakers due to the higher number of vowel phonemes and the lack of postvocalic/ô/in SAusE. Thus, the data need of adapting ASR from GenAm to SAusE might be reduced by using a small dataset focusing on differences in the size of vowel inventory and in rhoticity.
KW - accent differences
KW - adapting ASR to novel varieties
KW - automatic speech recognition
UR - http://www.scopus.com/inward/record.url?scp=85140080742&partnerID=8YFLogxK
UR - https://tuendeszalay.github.io/speechtech/Szalay_etal2022_knowledge_Interspeech.pdf
UR - http://purl.org/au-research/grants/arc/DP200103006
U2 - 10.21437/Interspeech.2022-10162
DO - 10.21437/Interspeech.2022-10162
M3 - Conference proceeding contribution
AN - SCOPUS:85140080742
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 1372
EP - 1376
BT - Interspeech 2022
A2 - Ko, Hanseok
A2 - Hansen, John H. L.
PB - International Speech Communication Association (ISCA)
CY - Baixas, France
T2 - Annual Conference of the International Speech Communication Association (23rd : 2022)
Y2 - 18 September 2022 through 22 September 2022
ER -