Skip to main navigation Skip to search Skip to main content

Dataset construction for multimodal detection of online gambling advertisements

I Wayan Budi Sentana*, I Nyoman Gede Arya Astawa, Junda Lu, I Made Ari Dwi Suta Atmaja, Ni Ketut Pradani Gayatri Sarja, Ni Nyoman Harini Puspita

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

This study presents the construction of a multimodal dataset designed to detect online gambling advertisement infiltrations on websites. The dataset incorporates both visual (image-based) and textual data extracted from compromised web pages. Data collection begins with a Google Engine Scraper that utilizes specialized search commands (commonly known as Google Hacking techniques) to identify URLs containing keywords frequently associated with online gambling in Bahasa Indonesia. Once identified, these URLs are processed using an automated Selenium-based module that retrieves and extracts the content of each webpage. The extracted content is then categorized into visual and textual components. The textual data is further analyzed using a large language model (LLM) via the OpenAI API to assist in the preliminary classification of gambling-related content. Final verification and labeling are performed manually to ensure accuracy. The resulting dataset comprises 600 samples—300 positively labeled as containing online gambling advertisements and 300 as non-infiltrated, forming a balanced and validated corpus for future multimodal detection model development.
Original languageEnglish
Title of host publicationProceedings of the International Conference on Applied Science and Technology on Engineering Science 2025 (iCAST-ES 2025)
EditorsMuhammad Udin Harun Al Rasyid, Mohammad Robihul Mufid, I Gede Artha Negara, Risa Nurin Baiti, Gusti Ayu Wulan Krisna Dewi, Ni Made Sintya Rani, Ni Putu Indah Yuliana
Place of PublicationOnline
PublisherSpringer, Springer Nature
Pages39-46
Number of pages8
ISBN (Electronic)9789464639261
DOIs
Publication statusPublished - 2025
EventInternational Conference on Applied Science and Technology on Engineering Science 2025 - Hybrid, Indonesia
Duration: 10 Oct 202511 Oct 2025

Publication series

NameAdvances in Engineering Research
PublisherSpringer Nature
Volume283
ISSN (Print)2731-8079
ISSN (Electronic)2352-5401

Conference

ConferenceInternational Conference on Applied Science and Technology on Engineering Science 2025
Abbreviated titleiCAST-ES 2025
Country/TerritoryIndonesia
CityHybrid
Period10/10/2511/10/25

Bibliographical note

Copyright the Author(s) 2025. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Keywords

  • Online Gambling Ad
  • Multimodal Dataset Type
  • Semantic Type Dataset
  • Visual Type Dataset

Fingerprint

Dive into the research topics of 'Dataset construction for multimodal detection of online gambling advertisements'. Together they form a unique fingerprint.

Cite this