Adversarial examples are commonly assumed to leave the manifold of the underyling data — although this has not been confirmed experimentally so far. This means that deep neural networks perform well on the manifold, however, slight perturbations in directions leaving the manifold may cause mis-classification. In this article, based on my recent CVPR’19 paper, I want to empirically show that adversarial examples indeed leave the manifold. For this purpose, I will present results on a synthetic dataset with known manifold as well as on MNIST with approximated manifold.
The code for my latest paper on confidence-calibrated adversarial training has been released on GitHub. The repository does not only include a PyTorch implementation of confidence-calibrated adversarial training, but also several white- and black box attacks to generate adversarial examples and the proposed confidence-thresholded robust test error. Furthermore, these implementations are fully tested and allow to reproduce the results from the paper. This article gives an overview of the repository and highlights its features and components.
Recently, I had the opportunity to present my work on confidence-calibrated adversarial training at the Bosch Center for Artifical Intelligence and the University of Tübingen, specifically, the newly formed Tübingen AI Center. As part of the talk, I outlined the motivation and strengths of confidence-calibrated adversarial training compared to standard adversarial training: robustness against previously unseen attacks and improved accuracy. I also touched on the difficulties faced during robustness evaluation. This article provides the corresponding slides and gives a short overview of the talk.
Adversarial training yields robust models against a specific threat model. However, robustness does not generalize to larger perturbations or threat models not seen during training. Confidence-calibrated adversarial training tackles this problem by biasing the network towards low-confidence predictions on adversarial examples. Through rejecting low-confidence (adversarial) examples, robustness generalizes to various threat models, including L2, L1 and L0 while training only on L∞ adversarial examples. This article gives a short abstract, discusses relevant updates to the previous version and includes paper and appendix.
Adversarial examples, imperceptibly perturbed examples causing mis-classification, are commonly assumed to lie off the underlying manifold of the data — the so-called manifold assumption. In this article, following my recent CVPR’19 paper, I demonstrate that adversarial examples can also be found on the data manifold, both on a synthetic dataset as well as on MNIST and Fashion-MNIST.
In the last few months, there were at least 50 papers per month related to adversarial examples — on ArXiv alone. While not all of them might meet the high bar of conferences such as ICLR, ICML or NeurIPS regarding their contributions and experiments, it becomes more and more difficult to stay on top of the literature. In this article, I want to share a categorized list of more than 240 papers on adversarial examples and related topics.