Skip to main navigation Skip to search Skip to main content

Multi-modal emotion recognition using tensor decomposition fusion and self-supervised multi-tasking

Rui Wang, Jiawei Zhu, Shoujin Wang, Tao Wang, Jingze Huang, Xianxun Zhu*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

With technological advancements, we can now capture rich dialogue content, tones, textual information, and visual data through tools like microphones, the internet, and cameras. However, relying solely on a single modality for emotion analysis often fails to accurately reflect the true emotional state, as this approach overlooks the dynamic correlations between different modalities. To address this, our study introduces a multimodal emotion recognition method that combines tensor decomposition fusion and self-supervised multi-task learning. This method first employs Tucker decomposition techniques to effectively reduce the model’s parameter count, lowering the risk of overfitting. Subsequently, by building a learning mechanism for both multimodal and unimodal tasks and incorporating the concept of label generation, it more accurately captures the emotional differences between modalities. We conducted extensive experiments and analyses on public datasets like CMU-MOSI and CMU-MOSEI, and the results show that our method significantly outperforms existing methods in terms of performance. The related code is open-sourced at https://github.com/ZhuJw31/MMER-TD.

Original languageEnglish
Article number39
Pages (from-to)1-14
Number of pages14
JournalInternational Journal of Multimedia Information Retrieval
Volume13
Issue number4
DOIs
Publication statusPublished - Dec 2024
Externally publishedYes

Keywords

  • Self-supervised
  • Multi-tasking
  • Emotion recognition
  • Multimodal

Fingerprint

Dive into the research topics of 'Multi-modal emotion recognition using tensor decomposition fusion and self-supervised multi-tasking'. Together they form a unique fingerprint.

Cite this