Adversarial training is the de-facto standard to obtain models robust against adversarial examples. However, on complex datasets, a significant loss in accuracy is incurred and the robustness does not generalize to attacks not used during training. This paper introduces confidence-calibrated adversarial training. By forcing the confidence on adversarial examples to decay with their distance to the training data, the loss in accuracy is reduced and robustness generalizes to other attacks and larger perturbations.
Obtaining deep networks robust against adversarial examples is a widely open problem. While many papers are devoted to training more robust deep networks, a clear definition of adversarial examples has not been agreed upon. In this article, I want to discuss two very simple toy examples illustrating the necessity of a proper definition of adversarial examples.
To date, it is unclear whether we can obtain both accurate and robust deep networks — meaning deep networks that generalize well and resist adversarial examples. In this pre-print, we aim to disentangle the relationship between adversarial robustness and generalization. The paper is available on ArXiv.