März, Lars Steffen (2024)
Robust Optimization for Adversarial Deep Learning.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00026745
Masterarbeit, Erstveröffentlichung, Verlagsversion
Kurzbeschreibung (Abstract)
Recent results demonstrated that images can be adversarially perturbed to a visually indistinguishable extent in order to misguide classifiers with high standard accuracy into making confident misclassifications. Adversarial examples may even be targeted to a class the attacker chooses and transfer between different DNNs in a black-box setting, meaning that perturbations computed on one DNN are likely to confuse other DNNs. This poses a concrete and acute security risk in digital domains like content moderation, but also in physical contexts like facial recognition and autonomous driving where adversarial samples proved to survive printing and re-capturing. The phenomenon was first discovered in 2014 by Szegedy et al. and has been subject of hundreds of papers ever since, both from an attacker's and a defender's point of view. There seems to be no apparent end to an arms race of frequently published attacks and defenses as no universal, provable and practical prevention method has been developed yet. In this work, we show that verifying ReLU-based DNNs against adversarial examples is NP-hard. Furthermore, we model the adversarial training problem as a distributionally robust optimization problem to provide a formal framework for two of the most promising defenses so far: Randomized FGSM-based adversarial training and randomized smoothing. Additionally, we propose two step size schemes for multi-step adversarial attacks that yield unprecedented low true-label-confidences. To make p-norm bounded attacks more comparable for different values of p, we define two norm rescaling functions before validating them on ImageNet. Moreover, we give an explanation as to why first-order adversarial training is successful from an empirical data augmentation perspective despite lacking the mathematical guarantees from Danskin's Theorem by analyzing cosine similarities of model parameter gradients on ImageNet. Finally, we give an update on the performance results from Giughi et al. of BOBYQA black-box attacks on CIFAR-10 by exposing instances of the two aforementioned state-of-the-art defenses to it.
Typ des Eintrags: | Masterarbeit |
---|---|
Erschienen: | 2024 |
Autor(en): | März, Lars Steffen |
Art des Eintrags: | Erstveröffentlichung |
Titel: | Robust Optimization for Adversarial Deep Learning |
Sprache: | Englisch |
Referenten: | Ulbrich, Prof. Dr. Stefan |
Publikationsjahr: | 7 März 2024 |
Ort: | Darmstadt |
Kollation: | 76 Seiten |
DOI: | 10.26083/tuprints-00026745 |
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/26745 |
Zugehörige Links: | |
Kurzbeschreibung (Abstract): | Recent results demonstrated that images can be adversarially perturbed to a visually indistinguishable extent in order to misguide classifiers with high standard accuracy into making confident misclassifications. Adversarial examples may even be targeted to a class the attacker chooses and transfer between different DNNs in a black-box setting, meaning that perturbations computed on one DNN are likely to confuse other DNNs. This poses a concrete and acute security risk in digital domains like content moderation, but also in physical contexts like facial recognition and autonomous driving where adversarial samples proved to survive printing and re-capturing. The phenomenon was first discovered in 2014 by Szegedy et al. and has been subject of hundreds of papers ever since, both from an attacker's and a defender's point of view. There seems to be no apparent end to an arms race of frequently published attacks and defenses as no universal, provable and practical prevention method has been developed yet. In this work, we show that verifying ReLU-based DNNs against adversarial examples is NP-hard. Furthermore, we model the adversarial training problem as a distributionally robust optimization problem to provide a formal framework for two of the most promising defenses so far: Randomized FGSM-based adversarial training and randomized smoothing. Additionally, we propose two step size schemes for multi-step adversarial attacks that yield unprecedented low true-label-confidences. To make p-norm bounded attacks more comparable for different values of p, we define two norm rescaling functions before validating them on ImageNet. Moreover, we give an explanation as to why first-order adversarial training is successful from an empirical data augmentation perspective despite lacking the mathematical guarantees from Danskin's Theorem by analyzing cosine similarities of model parameter gradients on ImageNet. Finally, we give an update on the performance results from Giughi et al. of BOBYQA black-box attacks on CIFAR-10 by exposing instances of the two aforementioned state-of-the-art defenses to it. |
Freie Schlagworte: | robust optimization, stochastic optimization, distributionally robust optimization, adversarial deep learning, adversarial examples, adversarial samples, adversarial robustness, deep learning, np-hard, np hard, np-hardness, np hardness, Danskin's Theorem, BOBYQA, FGSM, Fast Gradient Sign Method, PGD, Projected Gradient Descent, SGD, Stochastic Gradient Descent, black-box attack, white-box attack, image classification, ImageNet, CIFAR, CIFAR-10, gaussian smoothing, cross-entropy loss, CEL, ILSVRC, adversarial perturbation, p-norm, p norm, certified robustness, noise injection, randomized smoothing, gradient masking, gradient obfuscation, catastrophic overfitting, cosine similarity, step size, step length, harmonic, geometric, rescaling |
Status: | Verlagsversion |
URN: | urn:nbn:de:tuda-tuprints-267458 |
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik 500 Naturwissenschaften und Mathematik > 510 Mathematik |
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Künstliche Intelligenz und Maschinelles Lernen 04 Fachbereich Mathematik 04 Fachbereich Mathematik > Optimierung 04 Fachbereich Mathematik > Optimierung > Nonlinear Optimization 04 Fachbereich Mathematik > Stochastik |
Hinterlegungsdatum: | 07 Mär 2024 12:38 |
Letzte Änderung: | 12 Mär 2024 07:44 |
PPN: | |
Referenten: | Ulbrich, Prof. Dr. Stefan |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |