Minimalistic attacks: how little it takes to fool deep reinforcement learning policies

Xinghua Qu, Zhu Sun, Yew Soon Ong*, Abhishek Gupta, Pengfei Wei

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

Recent studies have revealed that neural-network-based policies can be easily fooled by adversarial examples. However, while most prior works analyze the effects of perturbing every pixel of every frame assuming white-box policy access, in this article, we take a more restrictive view toward adversary generation - with the goal of unveiling the limits of a model's vulnerability. In particular, we explore minimalistic attacks by defining three key settings: 1) Black-Box Policy Access: where the attacker only has access to the input (state) and output (action probability) of an RL policy; 2) Fractional-State Adversary: where only several pixels are perturbed, with the extreme case being a single-pixel adversary; and 3) Tactically Chanced Attack: where only significant frames are tactically chosen to be attacked. We formulate the adversarial attack by accommodating the three key settings, and explore their potency on six Atari games by examining four fully trained state-of-the-art policies. In Breakout, for example, we surprisingly find that: 1) all policies showcase significant performance degradation by merely modifying 0.01% of the input state and 2) the policy trained by DQN is totally deceived by perturbing only 1% frames.

Original languageEnglish
Pages (from-to)806-817
Number of pages12
JournalIEEE Transactions on Cognitive and Developmental Systems
Volume13
Issue number4
Early online date19 Feb 2020
DOIs
Publication statusPublished - Dec 2021
Externally publishedYes

Keywords

  • Adversarial Attack.
  • Reinforcement Learning
  • Adversarial attack
  • reinforcement learning (RL)

Fingerprint

Dive into the research topics of 'Minimalistic attacks: how little it takes to fool deep reinforcement learning policies'. Together they form a unique fingerprint.

Cite this