A multi-modal dataset for hate speech detection on social media: case-study of Russia-Ukraine conflict

Surendrabikram Thapa, Aditya Shah, Farhan Ahmad Jafri, Usman Naseem, Imran Razzak

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

24 Citations (Scopus)

Abstract

Hate speech consists of types of content (e.g. text, audio, image) that express derogatory sentiments and hate against certain people or groups of individuals. The internet, particularly social media and microblogging sites, have become an increasingly popular platform for expressing ideas and opinions. Hate speech is prevalent in both offline and online media. A substantial proportion of this kind of content is presented in different modalities (e.g. text, image, video). Taking into account that hate speech spreads quickly during political events, we present a novel multimodal dataset composed of 5680 text-image pairs of tweets data related to the Russia-Ukraine war and annotated with a binary class: "hate" or "no-hate" The baseline results show that multimodal resources are relevant to leverage the hateful information from different types of data. The baselines and dataset provided in this paper may boost researchers in direction of multimodal hate speech, mainly during serious conflicts such as war contexts.

Original languageEnglish
Title of host publicationProceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)
Place of PublicationStroudsburg
PublisherAssociation for Computational Linguistics
Pages1-6
Number of pages6
ISBN (Electronic)9781959429050
DOIs
Publication statusPublished - 2022
Externally publishedYes
Event5th Workshop on Challenges and Applications of Automated Extraction of Socio-Political Events from Text, CASE 2022 - Abu Dhabi, United Arab Emirates
Duration: 7 Dec 20228 Dec 2022

Conference

Conference5th Workshop on Challenges and Applications of Automated Extraction of Socio-Political Events from Text, CASE 2022
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period7/12/228/12/22

Fingerprint

Dive into the research topics of 'A multi-modal dataset for hate speech detection on social media: case-study of Russia-Ukraine conflict'. Together they form a unique fingerprint.

Cite this