ICML Talk "Confidence-Calibrated Adversarial Training" • David Stutz

Abstract

Check out the paper on ArXiv.

Adversarial training (AT), i.e., training on adversarial examples generating on-the-fly, is standard to obtain robust models within a specific threat model, e.g., $L_\infty$ adversarial examples. However, it has been shown that the obtained robustness does not generalize to attacks not seen during training such as larger $L_\infty$ perturbations or other $L_p$ threat models. Additionally, AT is known to reduce accuracy on clean examples. In this talk, I introduce Confidence-calibrated adversarial training (CCAT): CCAT tackles both problems by biasing the network towards low-confidence predictions on adversarial examples. By extrapolating low-confidence predictions beyond the $L_\infty$ adversarial examples seen during training, robustness generalizes to previously unseen attacks by rejecting low-confidence (adversarial) examples. Trained only on $L_\infty$ adversarial examples, I demonstrate improved robustness against $L_2$, $L_1$ and $L_0$ adversarial examples as well as adversarial frames. Furthermore, compared to AT, accuracy is improved.

Adversarial Training (AT):

Confidence-Calibrated Adversarial Training (CCAT):

Figure 1: Illustration of how confidence-calibrated adversarial training (CCAT, right) tackles the problems of poor generalization of the obtained robustness of standard adversarial training (AT, left).

Talk

The slides of the talk can be found below:

Slides

IAM

DAVIDSTUTZ

ARTICLE

ICML Talk “Confidence-Calibrated Adversarial Training”

Abstract

Talk

SEARCHTHEBLOG

ARCHIVES

TAGS