Adversarial examples are commonly assumed to leave the manifold of the underyling data — although this has not been confirmed experimentally so far. This means that deep neural networks perform well on the manifold, however, slight perturbations in directions leaving the manifold may cause mis-classification. In this article, based on my recent CVPR’19 paper, I want to empirically show that adversarial examples indeed leave the manifold. For this purpose, I will present results on a synthetic dataset with known manifold as well as on MNIST with approximated manifold.
The code for my latest paper on confidence-calibrated adversarial training has been released on GitHub. The repository does not only include a PyTorch implementation of confidence-calibrated adversarial training, but also several white- and black box attacks to generate adversarial examples and the proposed confidence-thresholded robust test error. Furthermore, these implementations are fully tested and allow to reproduce the results from the paper. This article gives an overview of the repository and highlights its features and components.