Interpretability is a Kind of Safety: An Interpreter-based Ensemble for Adversary Defense
Aug 13, 202028 views
While having achieved great success in rich real-life applications,,deep neural network (DNN) models have long been criticized for,their vulnerability to adversarial attacks. Tremendous research efforts have been dedicated to mitigating the threats of adversarial,attacks, but the essential trait of adversarial examples is not yet clear,,and most existing methods are yet vulnerable to hybrid attacks and,suffer from counterattacks. In light of this, in this paper, we first,reveal a gradient-based correlation between sensitivity analysisbased DNN interpreters and the generation process of adversarial,examples, which indicates the Achilles’s heel of adversarial attacks,and sheds light on linking together the two long-standing challenges of DNN: fragility and unexplainability. We then propose,an interpreter-based ensemble framework called X-Ensemble for,robust adversary defense. X-Ensemble adopts a novel detectionrectification process and features in building multiple sub-detectors,and a rectifier upon various types of interpretation information toward target classifiers. Moreover, X-Ensemble employs the Random,Forests (RF) model to combine sub-detectors into an ensemble detector for adversarial hybrid attacks defense. The non-differentiable,property of RF further makes it a precious choice against the counterattack of adversaries. Extensive experiments under various types,of state-of-the-art attacks and diverse attack scenarios demonstrate,the advantages of X-Ensemble to competitive baseline methods.