Towards speech classification from acoustic and vocal tract data in real-time MRI

Yaoyao Yue, Michael Proctor, Luping Zhou, Rijul Gupta, Tharinda Piyadasa, Amelia Gully, Kirrie Ballard, Craig Jin

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

Real-time magnetic resonance image (rtMRI) data of the upper airway provides a rich source of information about vocal tract shaping that can inform phonemic analysis and classification.We describe a multimodal phonemic classifier that combines articulatory data with speech audio features to improve performance.A deep network model processes rtMRI video data using ResNet18 and speech audio using a custom CNN and then combines the two data streams using a Transformer layer to fully explore the correlation of the two streams towards better vowel-consonant-vowel classification via the Transformer's multi-head self-attention mechanism.The classification accuracy of both the unimodal and multimodal models show substantial improvement on previous work (> 38%).The addition of audio features improves classification accuracy in the multimodal model by 7% compared with the unimodal model using articulatory data.We analyze the model and discuss the phonetic implications.

Original languageEnglish
Title of host publicationInterspeech 2024
Subtitle of host publicationProceedings of the 25th Annual Conference of the International Speech Communication Association
EditorsItshak Lapidot, Sharon Gannot
Place of PublicationBaixas, France
PublisherInternational Speech Communication Association
Pages1345-1349
Number of pages5
DOIs
Publication statusPublished - 2024
EventInterspeech Conference (24th : 2024) - Kos, Greece
Duration: 1 Sept 20245 Sept 2024

Publication series

NameINTERSPEECH
ISSN (Print)2308-457X
ISSN (Electronic)2958-1796

Conference

ConferenceInterspeech Conference (24th : 2024)
Country/TerritoryGreece
CityKos
Period1/09/245/09/24

Keywords

  • multimodal networks
  • phonemic classification
  • real-time MRI
  • Transformer
  • vocal tract

Cite this