Query-efficient black-box adversarial attacks on automatic speech recognition

Chuxuan Tong, Xi Zheng*, Jianhua Li, Xingjun Ma, Longxiang Gao*, Yong Xiang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)

Abstract

The susceptibility of Deep Neural Networks (DNNs) to adversarial attacks has raised concerns regarding their practical applications in real-world scenarios. Although the vulnerability of DNNs to adversarial attacks has been extensively studied in the image domain, research in the audio domain, particularly in the black-box setting with Automatic Speech Recognition (ASR) models, remains limited. While various black-box attacks have been proposed for ASR models, such as transfer attacks, hardware attacks, and query-based attacks, this study concentrates on query-based black-box attacks. The article introduces a new gradient estimation technique, Temporal Natural Evolution Strategies (T-NES), to generate adversarial audio samples more efficiently than existing attacks. T-NES leverages the temporal correlation present in audio to speed up gradient estimation based on the probability scores returned by the target model. The empirical results on benchmark datasets, LibriSpeech and TEDLIUM, and two state-of-the-art ASR models, DeepSpeech2 and Wav2Letter, demonstrate that T-NES can generate successful attacks with up to 30% fewer queries than existing attacks within 500 queries. T-NES could provide a robust baseline for evaluating the black-box adversarial vulnerability of ASR systems.

Original languageEnglish
Pages (from-to)3981-3992
Number of pages12
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume31
DOIs
Publication statusPublished - 2023

Fingerprint

Dive into the research topics of 'Query-efficient black-box adversarial attacks on automatic speech recognition'. Together they form a unique fingerprint.

Cite this