Skip to main navigation Skip to search Skip to main content

Entity-aware optimal transport and residual attention for multimodal content moderation

Siddhant Bikram Shah, Shuvam Shiwakoti, Touhid Bhuiyan, Mohammad Ali Moni, Surendrabikram Thapa, Usman Naseem

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

89 Downloads (Pure)

Abstract

The increasing prevalence of memes on social media platforms has amplified both the positive and negative impact of these highly shareable, multimodal artifacts. While memes can be humorous and engaging, they can also serve as vehicles for hateful or harmful content that targets specific social, ethnic, or political groups. In this paper, we propose ImTOTMeme, a novel framework for harmful meme detection that combines an optimal transport-based alignment mechanism with global residual interactions to better capture both local and contextual cues. We leverage CLIP embeddings for initial image and text representations and employ Sinkhorn iteration to learn a minimal-cost matching between fine-grained visual tokens and OCR-extracted text tokens. We further incorporate facial embeddings and entity information, allowing for more nuanced analysis of memes involving human subjects or contextual references. Through experiments on four publicly available datasets: Harm-C, Harm-P, FHM, and MultiOFF, we demonstrate that ImTOTMeme achieves competitive accuracy in both binary and multi-class settings. We further conduct an ablation study to verify the significance of each component in our framework, and use LIME-based visualizations to provide deeper interpretability into the model's classification decisions. Our findings highlight that an approach that balances local token-level alignment with broader contextual modeling can effectively detect harmful memes across diverse topical domains, paving the way for more robust and transparent content moderation on social media.

Original languageEnglish
Title of host publicationWWW Companion '25
Subtitle of host publicationCompanion proceedings of the ACM Web Conference 2025
Place of PublicationNew York
PublisherAssociation for Computing Machinery
Pages2306-2313
Number of pages8
ISBN (Electronic)9798400713316
DOIs
Publication statusPublished - 2025
Event34th ACM Web Conference, WWW Companion 2025 - Sydney, Australia
Duration: 28 Apr 20252 May 2025

Conference

Conference34th ACM Web Conference, WWW Companion 2025
Country/TerritoryAustralia
CitySydney
Period28/04/252/05/25

Bibliographical note

Copyright the Author(s) 2025. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Alternative title of the host publication: "WWW '25: Companion Proceedings of the ACM on Web Conference 2025"; "Companion Proceedings of the ACM Web Conference 2025 (WWW Companion '25), April 28-May 2, 2025, Sydney, NSW, Australia"

Keywords

  • optimal transport
  • facial features
  • multimodal content analysis

Fingerprint

Dive into the research topics of 'Entity-aware optimal transport and residual attention for multimodal content moderation'. Together they form a unique fingerprint.

Cite this