AdvMind: Inferring Adversary Intent of Black-Box Attacks
Aug 13, 20201 views
Deep neural networks (DNNs) are inherently susceptible to adversarial attacks even under black-box settings, in which the adversary,only has query access to the target model. In practice, while it may,be possible to effectively detect such attacks (e.g., observing massive similar but non-identical queries), it is often challenging to,exactly infer the adversary’s intent (e.g., the target class of the adversarial example the adversary attempts to craft) especially during,early stages of the attacks, which is crucial for performing effective,deterrence and remediation of the threats in many scenarios.,In this paper, we present,AdvMind,, a new class of estimation,models that infer the adversary intent of black-box adversarial attacks in a,robust,and,prompt,manner. Specifically, to achieve robust,detection,,AdvMind,accounts for the adversary adaptiveness such,that her attempt to conceal the target will significantly increase the,attack cost (e.g., the number of queries); to achieve prompt detection,,AdvMind,proactively synthesizes plausible query results to solicit subsequent queries from the adversary that maximally expose,her intent. Through extensive empirical evaluation on benchmark,datasets and state-of-the-art black-box attacks, we demonstrate,that on average,AdvMind,detects the adversary intent with over,75% accuracy after observing less than 3 query batches and meanwhile increases the cost of adaptive attacks by over 60%. We further,discuss the possible synergy between,AdvMind,and other defenses,against black-box adversarial attacks, pointing to several promising,research directions.