Abstract
The increasing prevalence of memes on social media platforms has amplified both the positive and negative impact of these highly shareable, multimodal artifacts. While memes can be humorous and engaging, they can also serve as vehicles for hateful or harmful content that targets specific social, ethnic, or political groups. In this paper, we propose ImTOTMeme, a novel framework for harmful meme detection that combines an optimal transport-based alignment mechanism with global residual interactions to better capture both local and contextual cues. We leverage CLIP embeddings for initial image and text representations and employ Sinkhorn iteration to learn a minimal-cost matching between fine-grained visual tokens and OCR-extracted text tokens. We further incorporate facial embeddings and entity information, allowing for more nuanced analysis of memes involving human subjects or contextual references. Through experiments on four publicly available datasets: Harm-C, Harm-P, FHM, and MultiOFF, we demonstrate that ImTOTMeme achieves competitive accuracy in both binary and multi-class settings. We further conduct an ablation study to verify the significance of each component in our framework, and use LIME-based visualizations to provide deeper interpretability into the model's classification decisions. Our findings highlight that an approach that balances local token-level alignment with broader contextual modeling can effectively detect harmful memes across diverse topical domains, paving the way for more robust and transparent content moderation on social media.
| Original language | English |
|---|---|
| Title of host publication | WWW Companion '25 |
| Subtitle of host publication | Companion proceedings of the ACM Web Conference 2025 |
| Place of Publication | New York |
| Publisher | Association for Computing Machinery |
| Pages | 2306-2313 |
| Number of pages | 8 |
| ISBN (Electronic) | 9798400713316 |
| DOIs | |
| Publication status | Published - 2025 |
| Event | 34th ACM Web Conference, WWW Companion 2025 - Sydney, Australia Duration: 28 Apr 2025 → 2 May 2025 |
Conference
| Conference | 34th ACM Web Conference, WWW Companion 2025 |
|---|---|
| Country/Territory | Australia |
| City | Sydney |
| Period | 28/04/25 → 2/05/25 |
Bibliographical note
Copyright the Author(s) 2025. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.Alternative title of the host publication: "WWW '25: Companion Proceedings of the ACM on Web Conference 2025"; "Companion Proceedings of the ACM Web Conference 2025 (WWW Companion '25), April 28-May 2, 2025, Sydney, NSW, Australia"
Keywords
- optimal transport
- facial features
- multimodal content analysis
Fingerprint
Dive into the research topics of 'Entity-aware optimal transport and residual attention for multimodal content moderation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver