ICML Paper “Confidence-Calibrated Adversarial Training”

Our paper on confidence-calibrated adversarial training was accepted at ICML’20. In the revised paper, the proposed confidence-calibrated adversarial training tackles the problem of obtaining robustness that generalizes to attacks not seen during training. This is achieved by biasing the network towards low-confidence predictions on adversarial examples and rejecting these low-confidence examples at test time. This article gives a short abstract and includes paper and code.

A short version of this paper was accepted at UDL 2020.

Project page, including code and pre-trained models, can be found here.


Adversarial training yields robust models against a specific threat model, e.g., $L_\infty$ adversarial examples. Typically robustness does not generalize to previously unseen threat models, e.g., other $L_p$ norms, or larger perturbations. Our confidence-calibrated adversarial training (CCAT) tackles this problem by biasing the model towards low confidence predictions on adversarial examples. By allowing to reject examples with low confidence, robustness generalizes beyond the threat model employed during training. CCAT, trained only on $L_\infty$ adversarial examples, increases robustness against larger $L_\infty$, $L_2$, $L_1$ and $L_0$ attacks, adversarial frames, distal adversarial examples and corrupted examples and yields better clean accuracy compared to adversarial training. For thorough evaluation we developed novel white- and black-box attacks directly attacking CCAT by maximizing confidence. For each threat model, we use $7$ attacks with up to $50$ restarts and $5000$ iterations and report worst-case robust test error, extended to our confidence-thresholded setting, across all attacks.

Revised Paper on ArXiv
    author    = {David Stutz and Matthias Hein and Bernt Schiele},
    title     = {Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks},
    journal   = {Proceedings of the International Conference on Machine Learning {ICML}},
    year      = {2020}

Code and Pre-Trained Models

The code for training and evaluation (attacks and metrics) including pre-trained models is available on GitHub:

Code on GitHub

The repository includes installation instructions and documentation.

What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.