ArXiv Pre-Print “Confidence-Calibrated Adversarial Training”

Adversarial training is the de-facto standard to obtain models robust against adversarial examples. However, on complex datasets, a significant loss in accuracy is incurred and the robustness does not generalize to attacks not used during training. This paper introduces confidence-calibrated adversarial training. By forcing the confidence on adversarial examples to decay with their distance to the training data, the loss in accuracy is reduced and robustness generalizes to other attacks and larger perturbations.

November 26, 2019. The paper has been updated, including a more thorough description of the experimental setup and evaluation metrics and presenting additional experiments with $L_0$ and $L_1$ attacks.


Figure 1: The effect of confidence calibration on adversarial training. Left: confidence per class along an adversarial direction for adversarial training (AT) and the proposed confidence-calibrated adversarial training (CCAT). Right: confidence histogram for test and adversarial examples.

Adversarial training is the standard to train models robust against adversarial examples. However, especially for complex datasets, adversarial training incurs a significant loss in accuracy and is known to generalize poorly to stronger attacks, e.g., larger perturbations or other threat models. In this paper, we introduce confidence-calibrated adversarial training (CCAT) where the key idea is to enforce that the confidence on adversarial examples decays with their distance to the attacked examples. We show that CCAT preserves better the accuracy of normal training while robustness against adversarial examples is achieved via confidence thresholding. Most importantly, in strong contrast to adversarial training, the robustness of CCAT generalizes to larger perturbations and other threat models, not encountered during training. We also discuss our extensive work to design strong adaptive attacks against CCAT and standard adversarial training which is of independent interest. We present experimental results on MNIST, SVHN and Cifar10.

Paper on ArXiv
    author    = {David Stutz and Matthias Hein and Bernt Schiele},
    title     = {Confidence-Calibrated Adversarial Training: Towards Robust Models Generalizing Beyond the Attack Used During Training},
    journal   = {CoRR},
    volume    = {abs/1910.06259},
    year      = {2019}
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.