PICK-OBJECT-ATTACK: type-specific adversarial attack for object detection

Omid Mohamad Nezami*, Akshay Chaturvedi, Mark Dras, Utpal Garain

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)


Many recent studies have shown that deep neural models are vulnerable to adversarial samples: images with imperceptible perturbations, for example, can fool image classifiers. In this paper, we present the first type-specific approach to generating adversarial examples for object detection, which entails detecting bounding boxes around multiple objects present in the image and classifying them at the same time, making it a harder task than against image classification. We specifically aim to attack the widely used Faster R-CNN by changing the predicted label for a particular object in an image: where prior work has targeted one specific object (a stop sign), we generalize to arbitrary objects, with the key challenge being the need to change the labels of all bounding boxes for all instances of that object type. To do so, we propose a novel method, named PICK-OBJECT-ATTACK. PICK-OBJECT-ATTACK successfully adds perturbations only to bounding boxes for the targeted object, preserving the labels of other detected objects in the image. In terms of perceptibility, the perturbations induced by the method are very small. Furthermore, for the first time, we examine the effect of adversarial attacks on object detection in terms of a downstream task, image captioning; we show that where a method that can modify all object types leads to very obvious changes in captions, the changes from our constrained attack are much less apparent.

Original languageEnglish
Article number103257
Pages (from-to)1-7
Number of pages7
JournalComputer Vision and Image Understanding
Publication statusPublished - Oct 2021


  • Adversarial attack
  • Faster R-CNN
  • Deep learning
  • Image captioning
  • Computer vision


Dive into the research topics of 'PICK-OBJECT-ATTACK: type-specific adversarial attack for object detection'. Together they form a unique fingerprint.

Cite this