Code Released: Confidence-Calibrated Adversarial Training

The code for my latest paper on confidence-calibrated adversarial training has been released on GitHub. The repository does not only include a PyTorch implementation of confidence-calibrated adversarial training, but also several white- and black box attacks to generate adversarial examples and the proposed confidence-thresholded robust test error. Furthermore, these implementations are fully tested and allow to reproduce the results from the paper. This article gives an overview of the repository and highlights its features and components.


Confidence-calibrated adversarial training (CCAT) biases the network towards low-confidence predictions on adversarial examples. Thereby, trained only on $L_\infty$ adversarial examples, it improves robustness against previously unseen attacks, such as other $L_p$ adversarial examples or larger perturbations. Thus, CCAT addresses an important drawback of standard adversarial training [].

The code for confidence-calibrated adversarial training is now available on GitHub:

Confidence-Calibrated Adversarial Training on GitHub

The corresponding paper is available on ArXiv; also check out the project page:

Paper on ArXiv
    author    = {David Stutz and Matthias Hein and Bernt Schiele},
    title     = {Confidence-Calibrated Adversarial Training: Generalizing Robustness to Unseen Attacks},
    booktitle   = {ICML},
    year      = {2020}


The repository includes the following implementations and features:

  • Training procedures for:
    • Adversarial Training []
    • Confidence-Calibrated Adversarial Training
  • Various white- and black-box adversarial attacks:
    • PGD [] with backtracking
    • (Reference implementation of PGD without backtracking)
    • Corner Search []
    • Query Limited [] with backtracking
    • ZOO [] with backtracking
    • Adversarial Frames []
    • Geometry []
    • Square []
  • Confidence-thresholded evaluation protocol for:
    • adverarial examples
    • distal adversarial examples
    • out-of-distribution examples
    • corrupted examples

More features:

  • All attacks follow a common interface, allow different objectives, initializations and $L_p$ norms and operate on batches.
  • Adversarial training supports any of the included attack and using a variable (for example 100% or 50%) fraction of adversarial examples per batch.
  • Confidence-calibrated adversarial training supports any of the included attack, different losses and transition functions.
  • Training supports data augmentation through imgaug and custom data loaders.
  • Evaluation includes per-example worst-case analysis and multiple restarts per attack.
  • Utilities, attacks and training are tested!


  • [] Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. ICLR, 2018.
  • [] Croce, F. and Hein, M. Sparse and imperceivable adversarial attacks. arXiv.org, abs/1909.05040, 2019.
  • [] Ilyas, A., Engstrom, L., Athalye, A., and Lin, J. Black-box adversarial attacks with limited queries and information. In ICML, 2018.
  • [] Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, Cho-Jui Hsieh. ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models. AISec@CCS, 2017.
  • [] Zajac, M., Zolna, K., Rostamzadeh, N., and Pinheiro, P. O. Adversarial framing for image and video classification. In AAAI Workshops, 2019.
  • [] Khoury, M. and Hadfield-Menell, D. On the geometry of adversarial examples. arXiv.org, abs/1811.00525, 2018.
  • [] Andriushchenko, M., Croce, F., Flammarion, N., and Hein, M. Square attack: a query-efficient black-box adversarial attack via random search. arXiv.org, 1912.00049, 2019.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.