It is well known that neural networks are susceptible to adversarial perturbations and are also computationally and memory intensive which makes it difficult to deploy them in real-world applications where security and computation are constrained. In this work, we aim to obtain both robust and sparse networks that are applicable to such scenarios, based on the intuition that latent features have a varying degree of susceptibility to adversarial perturbations. Specifically, we define vulnerability at the latent feature space and then propose a Bayesian framework to prioritize features based on their contribution to both the original and adversarial loss, to prune vulnerable features and preserve the robust ones. Through quantitative evaluation and qualitative analysis of the perturbation to latent features, we show that our sparsification method is a defense mechanism against adversarial attacks and the robustness indeed comes from our model's ability to prune vulnerable latent features that are more susceptible to adversarial perturbations.
Speakers: Divyam Madaan, Jinwoo Shin, Sung Ju Hwang