MLInitiative@WILDRE7: hybrid approaches with large language models for enhanced sentiment analysis in code-switched and code-mixed texts

Hariram Veeramani, Surendrabikram Thapa, Usman Naseem

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

1 Citation (Scopus)
41 Downloads (Pure)

Abstract

Code-switched and code-mixed languages are prevalent in multilingual societies, reflecting the complex interplay of cultures and languages in daily communication. Understanding the sentiment embedded in such texts is crucial for a range of applications, from improving social media analytics to enhancing customer feedback systems. Despite their significance, research in code-mixed and code-switched languages remains limited, particularly in less-resourced languages. This scarcity of research creates a gap in natural language processing (NLP) technologies, hindering their ability to accurately interpret the rich linguistic diversity of global communications. To bridge this gap, this paper presents a novel methodology for sentiment analysis in code-mixed and code-switched texts. Our approach combines the power of large language models (LLMs) and the versatility of the multilingual BERT (mBERT) framework to effectively process and analyze sentiments in multilingual data. By decomposing code-mixed texts into their constituent languages, employing mBERT for named entity recognition (NER) and sentiment label prediction, and integrating these insights into a decision-making LLM, we provide a comprehensive framework for understanding sentiment in complex linguistic contexts. Our system achieves competitive rank on all subtasks in the Code-mixed Less-Resourced Sentiment analysis (Code-mixed) shared task at WILDRE-7 (LREC-COLING).

Original languageEnglish
Title of host publicationProceedings of the 7th Workshop on Indian Language Data Resource and Evaluation @LREC-COLING-2024 (WILDRE-7)
EditorsGirish Jha, Sobha Lalitha Devi, Kalika Bali, Atul Kr. Ojha
Place of PublicationParis
PublisherEuropean Language Resources Association (ELRA)
Pages66-72
Number of pages7
ISBN (Electronic)9782493814371
Publication statusPublished - 2024
Event7th Workshop on Indian Language Data Resource and Evaluation, WILDRE 2024 - Torino, Italy
Duration: 25 May 202425 May 2024

Conference

Conference7th Workshop on Indian Language Data Resource and Evaluation, WILDRE 2024
Country/TerritoryItaly
CityTorino
Period25/05/2425/05/24

Bibliographical note

Copyright the Publisher 2024. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Keywords

  • Code-switched language
  • Sentiment analysis
  • Named entity recognition (NER)
  • Large language models (LLMs)

Fingerprint

Dive into the research topics of 'MLInitiative@WILDRE7: hybrid approaches with large language models for enhanced sentiment analysis in code-switched and code-mixed texts'. Together they form a unique fingerprint.

Cite this