Abstract
Code-switched and code-mixed languages are prevalent in multilingual societies, reflecting the complex interplay of cultures and languages in daily communication. Understanding the sentiment embedded in such texts is crucial for a range of applications, from improving social media analytics to enhancing customer feedback systems. Despite their significance, research in code-mixed and code-switched languages remains limited, particularly in less-resourced languages. This scarcity of research creates a gap in natural language processing (NLP) technologies, hindering their ability to accurately interpret the rich linguistic diversity of global communications. To bridge this gap, this paper presents a novel methodology for sentiment analysis in code-mixed and code-switched texts. Our approach combines the power of large language models (LLMs) and the versatility of the multilingual BERT (mBERT) framework to effectively process and analyze sentiments in multilingual data. By decomposing code-mixed texts into their constituent languages, employing mBERT for named entity recognition (NER) and sentiment label prediction, and integrating these insights into a decision-making LLM, we provide a comprehensive framework for understanding sentiment in complex linguistic contexts. Our system achieves competitive rank on all subtasks in the Code-mixed Less-Resourced Sentiment analysis (Code-mixed) shared task at WILDRE-7 (LREC-COLING).
Original language | English |
---|---|
Title of host publication | Proceedings of the 7th Workshop on Indian Language Data Resource and Evaluation @LREC-COLING-2024 (WILDRE-7) |
Editors | Girish Jha, Sobha Lalitha Devi, Kalika Bali, Atul Kr. Ojha |
Place of Publication | Paris |
Publisher | European Language Resources Association (ELRA) |
Pages | 66-72 |
Number of pages | 7 |
ISBN (Electronic) | 9782493814371 |
Publication status | Published - 2024 |
Event | 7th Workshop on Indian Language Data Resource and Evaluation, WILDRE 2024 - Torino, Italy Duration: 25 May 2024 → 25 May 2024 |
Conference
Conference | 7th Workshop on Indian Language Data Resource and Evaluation, WILDRE 2024 |
---|---|
Country/Territory | Italy |
City | Torino |
Period | 25/05/24 → 25/05/24 |
Bibliographical note
Copyright the Publisher 2024. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.Keywords
- Code-switched language
- Sentiment analysis
- Named entity recognition (NER)
- Large language models (LLMs)