TU Darmstadt / ULB / TUbiblio

Robust Optimization for Adversarial Deep Learning

März, Lars Steffen (2024)
Robust Optimization for Adversarial Deep Learning.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00026745
Master Thesis, Primary publication, Publisher's Version

Abstract

Recent results demonstrated that images can be adversarially perturbed to a visually indistinguishable extent in order to misguide classifiers with high standard accuracy into making confident misclassifications. Adversarial examples may even be targeted to a class the attacker chooses and transfer between different DNNs in a black-box setting, meaning that perturbations computed on one DNN are likely to confuse other DNNs. This poses a concrete and acute security risk in digital domains like content moderation, but also in physical contexts like facial recognition and autonomous driving where adversarial samples proved to survive printing and re-capturing. The phenomenon was first discovered in 2014 by Szegedy et al. and has been subject of hundreds of papers ever since, both from an attacker's and a defender's point of view. There seems to be no apparent end to an arms race of frequently published attacks and defenses as no universal, provable and practical prevention method has been developed yet. In this work, we show that verifying ReLU-based DNNs against adversarial examples is NP-hard. Furthermore, we model the adversarial training problem as a distributionally robust optimization problem to provide a formal framework for two of the most promising defenses so far: Randomized FGSM-based adversarial training and randomized smoothing. Additionally, we propose two step size schemes for multi-step adversarial attacks that yield unprecedented low true-label-confidences. To make p-norm bounded attacks more comparable for different values of p, we define two norm rescaling functions before validating them on ImageNet. Moreover, we give an explanation as to why first-order adversarial training is successful from an empirical data augmentation perspective despite lacking the mathematical guarantees from Danskin's Theorem by analyzing cosine similarities of model parameter gradients on ImageNet. Finally, we give an update on the performance results from Giughi et al. of BOBYQA black-box attacks on CIFAR-10 by exposing instances of the two aforementioned state-of-the-art defenses to it.

Item Type: Master Thesis
Erschienen: 2024
Creators: März, Lars Steffen
Type of entry: Primary publication
Title: Robust Optimization for Adversarial Deep Learning
Language: English
Referees: Ulbrich, Prof. Dr. Stefan
Date: 7 March 2024
Place of Publication: Darmstadt
Collation: 76 Seiten
DOI: 10.26083/tuprints-00026745
URL / URN: https://tuprints.ulb.tu-darmstadt.de/26745
Corresponding Links:
Abstract:

Recent results demonstrated that images can be adversarially perturbed to a visually indistinguishable extent in order to misguide classifiers with high standard accuracy into making confident misclassifications. Adversarial examples may even be targeted to a class the attacker chooses and transfer between different DNNs in a black-box setting, meaning that perturbations computed on one DNN are likely to confuse other DNNs. This poses a concrete and acute security risk in digital domains like content moderation, but also in physical contexts like facial recognition and autonomous driving where adversarial samples proved to survive printing and re-capturing. The phenomenon was first discovered in 2014 by Szegedy et al. and has been subject of hundreds of papers ever since, both from an attacker's and a defender's point of view. There seems to be no apparent end to an arms race of frequently published attacks and defenses as no universal, provable and practical prevention method has been developed yet. In this work, we show that verifying ReLU-based DNNs against adversarial examples is NP-hard. Furthermore, we model the adversarial training problem as a distributionally robust optimization problem to provide a formal framework for two of the most promising defenses so far: Randomized FGSM-based adversarial training and randomized smoothing. Additionally, we propose two step size schemes for multi-step adversarial attacks that yield unprecedented low true-label-confidences. To make p-norm bounded attacks more comparable for different values of p, we define two norm rescaling functions before validating them on ImageNet. Moreover, we give an explanation as to why first-order adversarial training is successful from an empirical data augmentation perspective despite lacking the mathematical guarantees from Danskin's Theorem by analyzing cosine similarities of model parameter gradients on ImageNet. Finally, we give an update on the performance results from Giughi et al. of BOBYQA black-box attacks on CIFAR-10 by exposing instances of the two aforementioned state-of-the-art defenses to it.

Uncontrolled Keywords: robust optimization, stochastic optimization, distributionally robust optimization, adversarial deep learning, adversarial examples, adversarial samples, adversarial robustness, deep learning, np-hard, np hard, np-hardness, np hardness, Danskin's Theorem, BOBYQA, FGSM, Fast Gradient Sign Method, PGD, Projected Gradient Descent, SGD, Stochastic Gradient Descent, black-box attack, white-box attack, image classification, ImageNet, CIFAR, CIFAR-10, gaussian smoothing, cross-entropy loss, CEL, ILSVRC, adversarial perturbation, p-norm, p norm, certified robustness, noise injection, randomized smoothing, gradient masking, gradient obfuscation, catastrophic overfitting, cosine similarity, step size, step length, harmonic, geometric, rescaling
Status: Publisher's Version
URN: urn:nbn:de:tuda-tuprints-267458
Classification DDC: 000 Generalities, computers, information > 004 Computer science
500 Science and mathematics > 510 Mathematics
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Artificial Intelligence and Machine Learning
04 Department of Mathematics
04 Department of Mathematics > Optimization
04 Department of Mathematics > Optimization > Nonlinear Optimization
04 Department of Mathematics > Stochastik
Date Deposited: 07 Mar 2024 12:38
Last Modified: 12 Mar 2024 07:44
PPN:
Referees: Ulbrich, Prof. Dr. Stefan
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)
Show editorial Details Show editorial Details