Speech emotion recognition using gammatone cepstral coefficients and deep learning features

Roneel V. Sharan*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

2 Citations (Scopus)

Abstract

Speech emotion recognition finds various applications, such as enhancing human-computer interaction and aiding remote mental health monitoring. This work proposes a method for speech emotion recognition using a combination of handcrafted and deep learning features. In particular, it studies the use of gammatone cepstral coefficients, which make use of gammatone filters which model the human auditory filters, and deep learning feature embeddings extracted from a pretrained network for audio analysis. A multilayer perceptron is employed for classification on the combined feature set where feature selection is performed using one-way analysis of variance. The proposed method is evaluated on a dataset of 535 speech recordings containing 7 types of emotions from 10 subjects. An average accuracy of 0.7631 is achieved in classifying the emotions using speech in leave-one-subject-out cross-validation. Analysis of the results shows that the use of gammatone cepstral coefficients provides improvement in classification accuracy over the conventional mel-frequency cepstral coefficients and the accuracy improves when combined with deep learning features.

Original languageEnglish
Title of host publicationProceedings of the 2023 IEEE International Conference on Machine Learning and Applied Network Technologies
Subtitle of host publicationICMLANT 2023
EditorsManuel Cardona, Vijender K. Solanki
Place of PublicationEl Salvador
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages4
ISBN (Electronic)9798350303919
ISBN (Print)9798350303926
DOIs
Publication statusPublished - 2023
Event2023 IEEE International Conference on Machine Learning and Applied Network Technologies, ICMLANT 2023 - Virtual, Online, El Salvador
Duration: 14 Dec 202315 Dec 2023

Publication series

Name IEEE International Conference on Machine Learning and Applied Network Technologies
PublisherIEEE

Conference

Conference2023 IEEE International Conference on Machine Learning and Applied Network Technologies, ICMLANT 2023
Country/TerritoryEl Salvador
CityVirtual, Online
Period14/12/2315/12/23

Keywords

  • deep learning features
  • feature selection
  • gammatone cepstral coefficients
  • mel-spectrogram
  • multilayer perceptron

Fingerprint

Dive into the research topics of 'Speech emotion recognition using gammatone cepstral coefficients and deep learning features'. Together they form a unique fingerprint.

Cite this