I am looking for full-time (applied) research opportunities in industry, involving (trustworthy and robust) machine learning or (3D) computer vision, starting early 2022. Check out my CV and get in touch on LinkedIn!


ICML Talk “Confidence-Calibrated Adversarial Training”

Confidence-calibrated adversarial training (CCAT) addresses two problems when training on adversarial examples: the lack of robustness against adversarial examples unseen during training, and the reduced (clean) accuracy. In particular, CCAT biases the model towards predicting low-confidence on adversarial examples such that adversarial examples can be rejected by confidence thresholding. In this article, I want to share the slides of the corresponding ICML talk.


Check out the paper on ArXiv.

Adversarial training (AT), i.e., training on adversarial examples generating on-the-fly, is standard to obtain robust models within a specific threat model, e.g., $L_\infty$ adversarial examples. However, it has been shown that the obtained robustness does not generalize to attacks not seen during training such as larger $L_\infty$ perturbations or other $L_p$ threat models. Additionally, AT is known to reduce accuracy on clean examples. In this talk, I introduce Confidence-calibrated adversarial training (CCAT): CCAT tackles both problems by biasing the network towards low-confidence predictions on adversarial examples. By extrapolating low-confidence predictions beyond the $L_\infty$ adversarial examples seen during training, robustness generalizes to previously unseen attacks by rejecting low-confidence (adversarial) examples. Trained only on $L_\infty$ adversarial examples, I demonstrate improved robustness against $L_2$, $L_1$ and $L_0$ adversarial examples as well as adversarial frames. Furthermore, compared to AT, accuracy is improved.

Adversarial Training (AT):

Confidence-Calibrated Adversarial Training (CCAT):

Figure 1: Illustration of how confidence-calibrated adversarial training (CCAT, right) tackles the problems of poor generalization of the obtained robustness of standard adversarial training (AT, left).


The slides of the talk can be found below:


What is your opinion on this article? Did you find it interesting or useful? Let me know your thoughts in the comments below: