Xi Wu, Uyeong Jang, Lingjiao Chen, Somesh Jha. The Manifold Assumption and Defenses Against Adversarial Perturbations OpenReview, 2018. https://openreview.net/forum?id=Hk-FlMbAZ.

Wu et al. propose a defense against adversarial examples based on the observation that (deep) neural networks learn confidence regions per class. In particular, their work is based on the assumption that neural networks learn separate manifolds for different classes. On these manifolds – according to the assumption – neural networks learn confidence regions where samples can be classified with high confidence. In a “good” model, these confidence regions should be – as the corresponding manifolds – well separated. They argue, theoretically, that adversarial training as employed by Madry et al. [1] helps to learn “good” models – i.e. the probability of an adversarial example with high confidence being found decreases. Taking confidence information into account, they propose a simple defense strategy: given a trained model (e.g. through adversarial training) and a test sample, we classify it according to the class of the most confident neighbor. This involves searching the neighborhood for confident examples; in practice, they employ a strategy similar to the Carlini Wagner attack [2]. In experiments, they show that this defense strategy can significantly reduce the impact of adversarial attacks. Additionally, this work further highlights the “manifold interpretation” of adversarial examples, i.e. that the data manifold plays an important role when considering adversarial examples.

  • [1] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, A. Vladu. Towards deep learning models resistant to adversarial attacks. CoRR, 2017.
  • [2] N. Carlini, D. A. Wagner. Towards evaluating the robustness of neural networks, SP, 2017.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.