Adversarial attacks on deep-learning models in natural language processing: a survey

Research output: Contribution to journalReview articlepeer-review

22 Citations (Scopus)

Abstract

With the development of high computational devices, deep neural networks (DNNs), in recent years, have gained significant popularity in many Artificial Intelligence (AI) applications. However, previous efforts have shown that DNNs are vulnerable to strategically modified samples, named adversarial examples. These samples are generated with some imperceptible perturbations, but can fool the DNNs to give false predictions. Inspired by the popularity of generating adversarial examples against DNNs in Computer Vision (CV), research efforts on attacking DNNs for Natural Language Processing (NLP) applications have emerged in recent years. However, the intrinsic difference between image (CV) and text (NLP) renders challenges to directly apply attacking methods in CV to NLP. Various methods are proposed addressing this difference and attack a wide range of NLP applications. In this article, we present a systematic survey on these works. We collect all related academic works since the first appearance in 2017. We then select, summarize, discuss, and analyze 40 representative works in a comprehensive way. To make the article self-contained, we cover preliminary knowledge of NLP and discuss related seminal works in computer vision. We conclude our survey with a discussion on open issues to bridge the gap between the existing progress and more robust adversarial attacks on NLP DNNs.

Original languageEnglish
Article number24
Pages (from-to)1-41
Number of pages41
JournalACM Transactions on Intelligent Systems and Technology
Volume11
Issue number3
DOIs
Publication statusPublished - Apr 2020

Keywords

  • adversarial examples
  • Deep neural networks
  • natural language processing
  • textual data

Fingerprint Dive into the research topics of 'Adversarial attacks on deep-learning models in natural language processing: a survey'. Together they form a unique fingerprint.

Cite this