Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples. However, the adversarially trained models do not perform well enough on test data or under other attack algorithms unseen during training, which remains to be improved. In this paper, we introduce a novel adversarial distributional training (ADT) framework for learning robust models. Specifically, we formulate ADT as a minimax optimization problem, where the inner maximization aims to learn an adversarial distribution to characterize the potential adversarial examples around a natural one, and the outer minimization aims to train robust classifiers by minimizing the expected loss over the worst-case adversarial distributions. We conduct a theoretical analysis on how to solve the minimax problem, leading to a general algorithm for ADT. We further propose three different approaches to parameterize the adversarial distributions. Empirical results on various benchmarks validate the effectiveness of ADT compared with the state-of-the-art AT methods.
Speakers: Yinpeng Dong, Zhijie Deng, Tianyu Pang, Hang Su, Jun Zhu