Semantic equivalent adversarial data augmentation for visual question answering

Ruixue Tang, Chao Ma*, Wei Emma Zhang, Qi Wu, Xiaokang Yang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

27 Citations (Scopus)

Abstract

Visual Question Answering (VQA) has achieved great success thanks to the fast development of deep neural networks (DNN). On the other hand, the data augmentation, as one of the major tricks for DNN, has been widely used in many computer vision tasks. However, there are few works studying the data augmentation problem for VQA and none of the existing image based augmentation schemes (such as rotation and flipping) can be directly applied to VQA due to its semantic structure – an ⟨image, question, answer⟩ triplet needs to be maintained correctly. For example, a direction related Question-Answer (QA) pair may not be true if the associated image is rotated or flipped. In this paper, instead of directly manipulating images and questions, we use generated adversarial examples for both images and questions as the augmented data. The augmented examples do not change the visual properties presented in the image as well as the semantic meaning of the question, the correctness of the ⟨image, question, answer⟩ is thus still maintained. We then use adversarial learning to train a classic VQA model (BUTD) with our augmented data. We find that we not only improve the overall performance on VQAv2, but also can withstand adversarial attack effectively, compared to the baseline model. The source code is available at https://github.com/zaynmi/seada-vqa.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2020
Subtitle of host publication16th European Conference Glasgow, UK, August 23–28, 2020 Proceedings, Part XIX
EditorsAndrea Vedaldi, Horst Bischof, Thomas Brox, Jan-Michael Frahm
Place of PublicationCham, Switzerland
PublisherSpringer, Springer Nature
Pages437-453
Number of pages17
ISBN (Electronic)9783030585297
ISBN (Print)9783030585280
DOIs
Publication statusPublished - 2020
Externally publishedYes
Event16th European Conference on Computer Vision, ECCV 2020 - Glasgow, United Kingdom
Duration: 23 Aug 202028 Aug 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12364 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th European Conference on Computer Vision, ECCV 2020
Country/TerritoryUnited Kingdom
CityGlasgow
Period23/08/2028/08/20

Keywords

  • Adversarial learning
  • Data augmentation
  • VQA

Fingerprint

Dive into the research topics of 'Semantic equivalent adversarial data augmentation for visual question answering'. Together they form a unique fingerprint.

Cite this