TU Darmstadt / ULB / TUbiblio

Robust Optimization for Adversarial Deep Learning

März, Lars Steffen (2024)
Robust Optimization for Adversarial Deep Learning.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00026745
Masterarbeit, Erstveröffentlichung, Verlagsversion

Kurzbeschreibung (Abstract)

Recent results demonstrated that images can be adversarially perturbed to a visually indistinguishable extent in order to misguide classifiers with high standard accuracy into making confident misclassifications. Adversarial examples may even be targeted to a class the attacker chooses and transfer between different DNNs in a black-box setting, meaning that perturbations computed on one DNN are likely to confuse other DNNs. This poses a concrete and acute security risk in digital domains like content moderation, but also in physical contexts like facial recognition and autonomous driving where adversarial samples proved to survive printing and re-capturing. The phenomenon was first discovered in 2014 by Szegedy et al. and has been subject of hundreds of papers ever since, both from an attacker's and a defender's point of view. There seems to be no apparent end to an arms race of frequently published attacks and defenses as no universal, provable and practical prevention method has been developed yet. In this work, we show that verifying ReLU-based DNNs against adversarial examples is NP-hard. Furthermore, we model the adversarial training problem as a distributionally robust optimization problem to provide a formal framework for two of the most promising defenses so far: Randomized FGSM-based adversarial training and randomized smoothing. Additionally, we propose two step size schemes for multi-step adversarial attacks that yield unprecedented low true-label-confidences. To make p-norm bounded attacks more comparable for different values of p, we define two norm rescaling functions before validating them on ImageNet. Moreover, we give an explanation as to why first-order adversarial training is successful from an empirical data augmentation perspective despite lacking the mathematical guarantees from Danskin's Theorem by analyzing cosine similarities of model parameter gradients on ImageNet. Finally, we give an update on the performance results from Giughi et al. of BOBYQA black-box attacks on CIFAR-10 by exposing instances of the two aforementioned state-of-the-art defenses to it.

Typ des Eintrags: Masterarbeit
Erschienen: 2024
Autor(en): März, Lars Steffen
Art des Eintrags: Erstveröffentlichung
Titel: Robust Optimization for Adversarial Deep Learning
Sprache: Englisch
Referenten: Ulbrich, Prof. Dr. Stefan
Publikationsjahr: 7 März 2024
Ort: Darmstadt
Kollation: 76 Seiten
DOI: 10.26083/tuprints-00026745
URL / URN: https://tuprints.ulb.tu-darmstadt.de/26745
Zugehörige Links:
Kurzbeschreibung (Abstract):

Recent results demonstrated that images can be adversarially perturbed to a visually indistinguishable extent in order to misguide classifiers with high standard accuracy into making confident misclassifications. Adversarial examples may even be targeted to a class the attacker chooses and transfer between different DNNs in a black-box setting, meaning that perturbations computed on one DNN are likely to confuse other DNNs. This poses a concrete and acute security risk in digital domains like content moderation, but also in physical contexts like facial recognition and autonomous driving where adversarial samples proved to survive printing and re-capturing. The phenomenon was first discovered in 2014 by Szegedy et al. and has been subject of hundreds of papers ever since, both from an attacker's and a defender's point of view. There seems to be no apparent end to an arms race of frequently published attacks and defenses as no universal, provable and practical prevention method has been developed yet. In this work, we show that verifying ReLU-based DNNs against adversarial examples is NP-hard. Furthermore, we model the adversarial training problem as a distributionally robust optimization problem to provide a formal framework for two of the most promising defenses so far: Randomized FGSM-based adversarial training and randomized smoothing. Additionally, we propose two step size schemes for multi-step adversarial attacks that yield unprecedented low true-label-confidences. To make p-norm bounded attacks more comparable for different values of p, we define two norm rescaling functions before validating them on ImageNet. Moreover, we give an explanation as to why first-order adversarial training is successful from an empirical data augmentation perspective despite lacking the mathematical guarantees from Danskin's Theorem by analyzing cosine similarities of model parameter gradients on ImageNet. Finally, we give an update on the performance results from Giughi et al. of BOBYQA black-box attacks on CIFAR-10 by exposing instances of the two aforementioned state-of-the-art defenses to it.

Freie Schlagworte: robust optimization, stochastic optimization, distributionally robust optimization, adversarial deep learning, adversarial examples, adversarial samples, adversarial robustness, deep learning, np-hard, np hard, np-hardness, np hardness, Danskin's Theorem, BOBYQA, FGSM, Fast Gradient Sign Method, PGD, Projected Gradient Descent, SGD, Stochastic Gradient Descent, black-box attack, white-box attack, image classification, ImageNet, CIFAR, CIFAR-10, gaussian smoothing, cross-entropy loss, CEL, ILSVRC, adversarial perturbation, p-norm, p norm, certified robustness, noise injection, randomized smoothing, gradient masking, gradient obfuscation, catastrophic overfitting, cosine similarity, step size, step length, harmonic, geometric, rescaling
Status: Verlagsversion
URN: urn:nbn:de:tuda-tuprints-267458
Sachgruppe der Dewey Dezimalklassifikatin (DDC): 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik
500 Naturwissenschaften und Mathematik > 510 Mathematik
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Künstliche Intelligenz und Maschinelles Lernen
04 Fachbereich Mathematik
04 Fachbereich Mathematik > Optimierung
04 Fachbereich Mathematik > Optimierung > Nonlinear Optimization
04 Fachbereich Mathematik > Stochastik
Hinterlegungsdatum: 07 Mär 2024 12:38
Letzte Änderung: 12 Mär 2024 07:44
PPN:
Referenten: Ulbrich, Prof. Dr. Stefan
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen