On adversarial training with incorrect labels

Benjamin Zhao, Junda Lu, Xiaowei Zhou, Dinusha Vatsalan, Muhammad Ikram, Dali Kaafar

Research output: Contribution to conferencePaperpeer-review

Abstract

In this work, we study adversarial training in the presence of incorrectly labeled data. Specifically, the predictive performance of an adversarially trained Machine Learning (ML) model on clean data and when the labels of training data and adversarial examples contain erro- neous labels. Such erroneous labels may arise organically from a flawed labeling process or maliciously akin to a poisoning attacker. We exten- sively investigate the effect of incorrect labels on model accuracy and robustness with variations to 1) when incorrect labels are applied to the adversarial training process, 2) the extent of data impacted by incorrect labels (a poisoning rate), 3) the consistency of the incorrect labels either applied randomly or with a constant mapping, 4) the model architec- ture used for classification, and 5) an ablation study on varying training settings of pretraining, adversarial initialization, and adversarial train- ing strength. While further observing generalized effects over multiple datasets. An input label change to an incorrect one may occur before the model is trained in the training dataset, or during the adversarial sample curation, where annotators make mistakes labeling the sourced adversar- ial example. Interestingly our results indicate that this flawed adversarial training process may counter-intuitively function as data augmentation, yielding improved outcomes for the adversarial robustness of the model.
Original languageEnglish
Number of pages16
Publication statusAccepted/In press - 4 Sept 2024
Event International Web Information Systems Engineering conference - Doha, Doha, Qatar
Duration: 2 Dec 20245 Dec 2024
Conference number: 25
https://wise2024-qatar.com

Conference

Conference International Web Information Systems Engineering conference
Abbreviated titleWISE
Country/TerritoryQatar
CityDoha
Period2/12/245/12/24
Internet address

Cite this