März, Lars Steffen (2024)
Robust Optimization for Adversarial Deep Learning.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00026745
Master Thesis, Primary publication, Publisher's Version
Abstract
Recent results demonstrated that images can be adversarially perturbed to a visually indistinguishable extent in order to misguide classifiers with high standard accuracy into making confident misclassifications. Adversarial examples may even be targeted to a class the attacker chooses and transfer between different DNNs in a black-box setting, meaning that perturbations computed on one DNN are likely to confuse other DNNs. This poses a concrete and acute security risk in digital domains like content moderation, but also in physical contexts like facial recognition and autonomous driving where adversarial samples proved to survive printing and re-capturing. The phenomenon was first discovered in 2014 by Szegedy et al. and has been subject of hundreds of papers ever since, both from an attacker's and a defender's point of view. There seems to be no apparent end to an arms race of frequently published attacks and defenses as no universal, provable and practical prevention method has been developed yet. In this work, we show that verifying ReLU-based DNNs against adversarial examples is NP-hard. Furthermore, we model the adversarial training problem as a distributionally robust optimization problem to provide a formal framework for two of the most promising defenses so far: Randomized FGSM-based adversarial training and randomized smoothing. Additionally, we propose two step size schemes for multi-step adversarial attacks that yield unprecedented low true-label-confidences. To make p-norm bounded attacks more comparable for different values of p, we define two norm rescaling functions before validating them on ImageNet. Moreover, we give an explanation as to why first-order adversarial training is successful from an empirical data augmentation perspective despite lacking the mathematical guarantees from Danskin's Theorem by analyzing cosine similarities of model parameter gradients on ImageNet. Finally, we give an update on the performance results from Giughi et al. of BOBYQA black-box attacks on CIFAR-10 by exposing instances of the two aforementioned state-of-the-art defenses to it.
Item Type: | Master Thesis |
---|---|
Erschienen: | 2024 |
Creators: | März, Lars Steffen |
Type of entry: | Primary publication |
Title: | Robust Optimization for Adversarial Deep Learning |
Language: | English |
Referees: | Ulbrich, Prof. Dr. Stefan |
Date: | 7 March 2024 |
Place of Publication: | Darmstadt |
Collation: | 76 Seiten |
DOI: | 10.26083/tuprints-00026745 |
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/26745 |
Corresponding Links: | |
Abstract: | Recent results demonstrated that images can be adversarially perturbed to a visually indistinguishable extent in order to misguide classifiers with high standard accuracy into making confident misclassifications. Adversarial examples may even be targeted to a class the attacker chooses and transfer between different DNNs in a black-box setting, meaning that perturbations computed on one DNN are likely to confuse other DNNs. This poses a concrete and acute security risk in digital domains like content moderation, but also in physical contexts like facial recognition and autonomous driving where adversarial samples proved to survive printing and re-capturing. The phenomenon was first discovered in 2014 by Szegedy et al. and has been subject of hundreds of papers ever since, both from an attacker's and a defender's point of view. There seems to be no apparent end to an arms race of frequently published attacks and defenses as no universal, provable and practical prevention method has been developed yet. In this work, we show that verifying ReLU-based DNNs against adversarial examples is NP-hard. Furthermore, we model the adversarial training problem as a distributionally robust optimization problem to provide a formal framework for two of the most promising defenses so far: Randomized FGSM-based adversarial training and randomized smoothing. Additionally, we propose two step size schemes for multi-step adversarial attacks that yield unprecedented low true-label-confidences. To make p-norm bounded attacks more comparable for different values of p, we define two norm rescaling functions before validating them on ImageNet. Moreover, we give an explanation as to why first-order adversarial training is successful from an empirical data augmentation perspective despite lacking the mathematical guarantees from Danskin's Theorem by analyzing cosine similarities of model parameter gradients on ImageNet. Finally, we give an update on the performance results from Giughi et al. of BOBYQA black-box attacks on CIFAR-10 by exposing instances of the two aforementioned state-of-the-art defenses to it. |
Uncontrolled Keywords: | robust optimization, stochastic optimization, distributionally robust optimization, adversarial deep learning, adversarial examples, adversarial samples, adversarial robustness, deep learning, np-hard, np hard, np-hardness, np hardness, Danskin's Theorem, BOBYQA, FGSM, Fast Gradient Sign Method, PGD, Projected Gradient Descent, SGD, Stochastic Gradient Descent, black-box attack, white-box attack, image classification, ImageNet, CIFAR, CIFAR-10, gaussian smoothing, cross-entropy loss, CEL, ILSVRC, adversarial perturbation, p-norm, p norm, certified robustness, noise injection, randomized smoothing, gradient masking, gradient obfuscation, catastrophic overfitting, cosine similarity, step size, step length, harmonic, geometric, rescaling |
Status: | Publisher's Version |
URN: | urn:nbn:de:tuda-tuprints-267458 |
Classification DDC: | 000 Generalities, computers, information > 004 Computer science 500 Science and mathematics > 510 Mathematics |
Divisions: | 20 Department of Computer Science 20 Department of Computer Science > Artificial Intelligence and Machine Learning 04 Department of Mathematics 04 Department of Mathematics > Optimization 04 Department of Mathematics > Optimization > Nonlinear Optimization 04 Department of Mathematics > Stochastik |
Date Deposited: | 07 Mar 2024 12:38 |
Last Modified: | 12 Mar 2024 07:44 |
PPN: | |
Referees: | Ulbrich, Prof. Dr. Stefan |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Send an inquiry |
Options (only for editors)
Show editorial Details |