Abstract
This work presents a highly effective strategy for attacking image captioning models through the use of prompt engineering. The objective of this approach is to deliberately guiding the output of LLMs and introduce dynamic noise into the original clean image captions, causing them to be categorized as a different class. Consequently, when the image captioning model is fine-tuned using adversarial captions, it will deteriorate and produce inaccurate captions for clean photos. The novelty of this attack is that it does not require the attacker to perform any model training and only require to prompt the LLMs to generate only a small amount of captions for the attack to be effective. Comprehensive experiments using GPT-3.5 indicate that with only 100 captions created by LLMs with malicious intent can significantly worsen picture captioning model performance by up to over 50% in BLEU metric and over 25% in ROUGE-L and METEOR scores.
Original language | English |
---|---|
Title of host publication | 2024 17th International Conference on Security of Information and Networks |
Subtitle of host publication | SIN 2024 |
Place of Publication | Piscataway, NJ |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Pages | 1-7 |
Number of pages | 7 |
ISBN (Electronic) | 9798331509736 |
DOIs | |
Publication status | Published - 2024 |
Event | International Conference on Security of Information and Networks (17th : 2024) - Sydney, Australia Duration: 2 Dec 2024 → 4 Dec 2024 |
Conference
Conference | International Conference on Security of Information and Networks (17th : 2024) |
---|---|
Abbreviated title | SIN 2024 |
Country/Territory | Australia |
City | Sydney |
Period | 2/12/24 → 4/12/24 |
Keywords
- adversarial attack
- image captioning
- large language models
- prompt engineering