TY - JOUR
T1 - Contrastive-based removal of negative information in multimodal emotion analysis
AU - Wang, Rui
AU - Wang, Yaoyang
AU - Cambria, Erik
AU - Fan, Xuhui
AU - Yu, Xiaohan
AU - Huang, Yao
AU - E, Xiaosong
AU - Zhu, Xianxun
PY - 2025/6
Y1 - 2025/6
N2 - Multimodal sentiment analysis bridges the communication gap between humans and machines by accurately recognizing human emotions. However, existing approaches often focus on synchronizing multimodal data to enhance accuracy, overlooking the critical role of negative information. Negative information generally refers to noise or inconsistencies with primary emotional labels within the data, such as disparities in emotional expressions across different modalities and noisy data elements. These issues can significantly compromise the effectiveness of sentiment analysis systems. To address this challenge, we propose a novel method based on contrastive learning for the removal of non-relevant features within single modalities, aiming to eliminate negative information in speech, text, and image data. Additionally, we have designed an enhanced multi-head attention mechanism that integrates the cleansed features into a unified representation for emotion analysis. Experimental evaluations on the CMU-MOSI and CMU-MOSEI datasets demonstrate that our method significantly outperforms existing approaches in sentiment analysis tasks. This method not only improves accuracy but also ensures the system’s robustness against the diverse and noisy nature of real-world data. The relevant code is available at https://github.com/YaoYangWang/MECAM.[Graphical abstract presents]
AB - Multimodal sentiment analysis bridges the communication gap between humans and machines by accurately recognizing human emotions. However, existing approaches often focus on synchronizing multimodal data to enhance accuracy, overlooking the critical role of negative information. Negative information generally refers to noise or inconsistencies with primary emotional labels within the data, such as disparities in emotional expressions across different modalities and noisy data elements. These issues can significantly compromise the effectiveness of sentiment analysis systems. To address this challenge, we propose a novel method based on contrastive learning for the removal of non-relevant features within single modalities, aiming to eliminate negative information in speech, text, and image data. Additionally, we have designed an enhanced multi-head attention mechanism that integrates the cleansed features into a unified representation for emotion analysis. Experimental evaluations on the CMU-MOSI and CMU-MOSEI datasets demonstrate that our method significantly outperforms existing approaches in sentiment analysis tasks. This method not only improves accuracy but also ensures the system’s robustness against the diverse and noisy nature of real-world data. The relevant code is available at https://github.com/YaoYangWang/MECAM.[Graphical abstract presents]
KW - Multimodal sentiment analysis
KW - Contrastive learning
KW - Attention mechanisms
UR - http://www.scopus.com/inward/record.url?scp=105005659151&partnerID=8YFLogxK
U2 - 10.1007/s12559-025-10463-9
DO - 10.1007/s12559-025-10463-9
M3 - Article
AN - SCOPUS:105005659151
SN - 1866-9956
VL - 17
SP - 1
EP - 16
JO - Cognitive Computation
JF - Cognitive Computation
IS - 3
M1 - 107
ER -