Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack

ICML 2020

Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack

Jul 12, 2020
|
33 views
|
|
Code
Details
The evaluation of robustness against adversarial manipulation of neural networks-based classifiers is mainly tested with empirical attacks as the methods for the exact computation, even when available, do not scale to large networks. We propose in this paper a new white-box adversarial attack wrt the $l_p$-norms for $p \in \{1,2,\infty\}$ aiming at finding the minimal perturbation necessary to change the class of a given input. It has an intuitive geometric meaning, yields high quality results already with one restart, minimizes the size of the perturbation, so that the robust accuracy can be evaluated at all possible thresholds with a single run, and comes with almost no free parameters except number of iterations and restarts. It achieves better or similar robust test accuracy compared to state-of-the-art attacks which are partially specialized to one $l_p$-norm. Speakers: Francesco Croce, Matthias Hein

Comments
loading...