Code Released: Confidence-Calibrated Adversarial Training

Introduction

Confidence-calibrated adversarial training (CCAT) biases the network towards low-confidence predictions on adversarial examples. Thereby, trained only on $L_\infty$ adversarial examples, it improves robustness against previously unseen attacks, such as other $L_p$ adversarial examples or larger perturbations. Thus, CCAT addresses an important drawback of standard adversarial training [].

The code for confidence-calibrated adversarial training is now available on GitHub:

Confidence-Calibrated Adversarial Training on GitHub

The corresponding paper is available on ArXiv; also check out the project page:

Paper on ArXiv

@inproceedings{Stutz2020ICML,
    author    = {David Stutz and Matthias Hein and Bernt Schiele},
    title     = {Confidence-Calibrated Adversarial Training: Generalizing Robustness to Unseen Attacks},
    booktitle   = {ICML},
    year      = {2020}
}

Features

The repository includes the following implementations and features:

Training procedures for:
- Adversarial Training []
- Confidence-Calibrated Adversarial Training
Various white- and black-box adversarial attacks:
- PGD [] with backtracking
- (Reference implementation of PGD without backtracking)
- Corner Search []
- Query Limited [] with backtracking
- ZOO [] with backtracking
- Adversarial Frames []
- Geometry []
- Square []
Confidence-thresholded evaluation protocol for:

adverarial examples
distal adversarial examples
out-of-distribution examples
corrupted examples

More features:

All attacks follow a common interface, allow different objectives, initializations and $L_p$ norms and operate on batches.
Adversarial training supports any of the included attack and using a variable (for example 100% or 50%) fraction of adversarial examples per batch.
Confidence-calibrated adversarial training supports any of the included attack, different losses and transition functions.
Training supports data augmentation through imgaug and custom data loaders.
Evaluation includes per-example worst-case analysis and multiple restarts per attack.
Utilities, attacks and training are tested!

References

[] Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. ICLR, 2018.
[] Croce, F. and Hein, M. Sparse and imperceivable adversarial attacks. arXiv.org, abs/1909.05040, 2019.
[] Ilyas, A., Engstrom, L., Athalye, A., and Lin, J. Black-box adversarial attacks with limited queries and information. In ICML, 2018.
[] Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, Cho-Jui Hsieh. ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models. AISec@CCS, 2017.
[] Zajac, M., Zolna, K., Rostamzadeh, N., and Pinheiro, P. O. Adversarial framing for image and video classification. In AAAI Workshops, 2019.
[] Khoury, M. and Hadfield-Menell, D. On the geometry of adversarial examples. arXiv.org, abs/1811.00525, 2018.
[] Andriushchenko, M., Croce, F., Flammarion, N., and Hein, M. Square attack: a query-efficient black-box adversarial attack via random search. arXiv.org, 1912.00049, 2019.

IAM

DAVIDSTUTZ

ARTICLE

Introduction

Features

References

SEARCHTHEBLOG

ARCHIVES

TAGS