Prompt engineering adversarial attack against image captioning models

Hiep Vo, Shui Yu, Xi Zheng

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

This work presents a highly effective strategy for attacking image captioning models through the use of prompt engineering. The objective of this approach is to deliberately guiding the output of LLMs and introduce dynamic noise into the original clean image captions, causing them to be categorized as a different class. Consequently, when the image captioning model is fine-tuned using adversarial captions, it will deteriorate and produce inaccurate captions for clean photos. The novelty of this attack is that it does not require the attacker to perform any model training and only require to prompt the LLMs to generate only a small amount of captions for the attack to be effective. Comprehensive experiments using GPT-3.5 indicate that with only 100 captions created by LLMs with malicious intent can significantly worsen picture captioning model performance by up to over 50% in BLEU metric and over 25% in ROUGE-L and METEOR scores.

Original languageEnglish
Title of host publication2024 17th International Conference on Security of Information and Networks
Subtitle of host publicationSIN 2024
Place of PublicationPiscataway, NJ
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages1-7
Number of pages7
ISBN (Electronic)9798331509736
DOIs
Publication statusPublished - 2024
EventInternational Conference on Security of Information and Networks (17th : 2024) - Sydney, Australia
Duration: 2 Dec 20244 Dec 2024

Conference

ConferenceInternational Conference on Security of Information and Networks (17th : 2024)
Abbreviated titleSIN 2024
Country/TerritoryAustralia
CitySydney
Period2/12/244/12/24

Keywords

  • adversarial attack
  • image captioning
  • large language models
  • prompt engineering

Cite this