Updated Results for Confidence-Calibrated Adversarial Training

Since I worked on confidence-calibrated training (CCAT) some years ago, CCAT has been evaluated using novel attacks. In this article, I want to share some updated results and numbers and contrast the reported numbers with newer experiments that I ran.

Some of these results can also be found in my PhD thesis in Section 7.5

Since I worked on CCAT, several new attacks have been proposed and some of the corresponding papers also record updated numbers. For example, Adaptive AutoAttack (AAA) [YBTV21] automatically searches for adaptive attacks, considering various attack types, hyper-parameters and objectives. This also includes objectives that can deal with the reduced confidence that CCAT imposes on adversarial examples. Similarly, [Sch22] adapts the projected gradient descent (PGD) based attack that is used during CCAT training with an improved backtracking scheme. So in the following, I want to present some updated numbers. Remember that CCAT was evaluated using a confidence-thresholded robust test error (RErr) against L∞-constrained adversarial examples. This is because the confidence needs to be taken into account during evaluation for fair comparison between CCAT and standard adversarial training (AT).

For a start, the AAA paper includes results for ϵ = 8/255 ≈ 0.0314, which is slightly larger than the ϵ = 0.03 that I used in the CCAT paper. In the paper, we reported a RErr of 68.4% with ϵ = 0.03, while AAA obtains 60.46% (with ϵ = 8/255 ≈ 0.0314) using the “default” setting including 3 adaptive attacks. Searching for more attacks (8) instead, RErr increases to 73.13%. This shows that our model is less robust than previously reported — subject to the difference in ϵ used. So I also reran the code of AAA with ϵ = 0.03 and obtained 65.6% RErr, which is slightly lower than the number originally reported. Re-training CCAT using a WRN-28-10 [ZK16] further reduces RErr to 54.5% and also increases runtime of the attack significantly (which includes actually finding the adaptive attacks). Against the ResNet-20 that I originally trained, AAA requires 205 minutes to run and 659 minutes to find the attacks (according to the paper). On a NVIDIA™ Tesla® P40, considering the WRN-28-10, the attack and search time exceed 1000 and 2500 minutes, respectively. Overall, this means that CCAT actually holds up quite nicely against AAA.

More recently, in [Sch22], the attack used in CCAT training was adapted using the Armijo-rule for backtracking, requiring 9 additional forward passes per iteration. Additionally performing more than 100 restarts increases the attack time significantly. However, running standard PGD, my adapted PGD and the proposed one for 1000 iterations and 100 restarts each, RErr increases dramatically to 92.5%, suggesting very poor robustness. Interestingly, the proposed attack with only T = 100 iterations reaches only 59.9% RErr. This means that running many iterations is important. I didn't reproduce these results with the more robust WRN-28-10 though. However, as the runtime for the ResNet-20 already exceeds 2000 minutes, this shows that CCAT is extremely difficult to crack — even if it might eventually be fooled. Both attacks also highlight that more sophisticated attacks and more runtime is required to crack CCAT in comparison to other defenses or standard AT.

  • [Sch22] Christian Schlarmann. Confidence-calibrated adversarial training and out-of-distribution detection. Master’s thesis, University of Tübingen, February 2022.
  • [YBTV21] Chengyuan Yao, Pavol Bielik, Petar Tsankov, and Martin T. Vechev. Automated discovery of adaptive attacks on adversarial defenses. arXiv.org, abs/2102.11860, 2021.
  • [ZK16] Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. In Proc. of the British Machine Vision Conference (BMVC), 2016.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.