Significance of phonological features in speech emotion recognition

Wei Wang, Paul A. Watters, Xinyi Cao, Lingjie Shen, Bo Li*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

A novel Speech Emotion Recognition (SER) method based on phonological features is proposed in this paper. Intuitively, as expert knowledge derived from linguistics, phonological features are correlated with emotions. However, it has been found that they are seldomly used as features to improve SER. Motivated by this, we set our goal to utilize phonological features to further advance SER’s accuracy since they can provide complementary information for the task. Furthermore, we will also explore the relationship between phonological features and emotions. Firstly, instead of only based on acoustic features, we devise a new SER approach by fusing phonological representations and acoustic features together. A significant improvement in SER performance has been demonstrated on a publicly available SER database named Interactive Emotional Dyadic Motion Capture (IEMOCAP). Secondly, the experimental results show that the top-performing method for the task of categorical emotion recognition is a deep learning-based classifier which generates an unweighted average recall (UAR) accuracy of 60.02%. Finally, we investigate the most discriminative features and find some patterns of emotional rhyme based on the phonological representations.

Original languageEnglish
Pages (from-to)633-642
Number of pages10
JournalInternational Journal of Speech Technology
Volume23
Issue number3
DOIs
Publication statusPublished - Sep 2020
Externally publishedYes

Keywords

  • Acoustic features
  • Feature analysis
  • Phonological features
  • Speech emotion recognition

Fingerprint Dive into the research topics of 'Significance of phonological features in speech emotion recognition'. Together they form a unique fingerprint.

Cite this