Abstract
In this work, we study adversarial training in the presence of incorrectly labeled data. Specifically, the predictive performance of an adversarially trained Machine Learning (ML) model on clean data and when the labels of training data and adversarial examples contain erro- neous labels. Such erroneous labels may arise organically from a flawed labeling process or maliciously akin to a poisoning attacker. We exten- sively investigate the effect of incorrect labels on model accuracy and robustness with variations to 1) when incorrect labels are applied to the adversarial training process, 2) the extent of data impacted by incorrect labels (a poisoning rate), 3) the consistency of the incorrect labels either applied randomly or with a constant mapping, 4) the model architec- ture used for classification, and 5) an ablation study on varying training settings of pretraining, adversarial initialization, and adversarial train- ing strength. While further observing generalized effects over multiple datasets. An input label change to an incorrect one may occur before the model is trained in the training dataset, or during the adversarial sample curation, where annotators make mistakes labeling the sourced adversar- ial example. Interestingly our results indicate that this flawed adversarial training process may counter-intuitively function as data augmentation, yielding improved outcomes for the adversarial robustness of the model.
Original language | English |
---|---|
Number of pages | 16 |
Publication status | Accepted/In press - 4 Sept 2024 |
Event | International Web Information Systems Engineering conference - Doha, Doha, Qatar Duration: 2 Dec 2024 → 5 Dec 2024 Conference number: 25 https://wise2024-qatar.com |
Conference
Conference | International Web Information Systems Engineering conference |
---|---|
Abbreviated title | WISE |
Country/Territory | Qatar |
City | Doha |
Period | 2/12/24 → 5/12/24 |
Internet address |