Introduction
Confidence-calibrated adversarial training (CCAT) biases the network towards low-confidence predictions on adversarial examples. Thereby, trained only on $L_\infty$ adversarial examples, it improves robustness against previously unseen attacks, such as other $L_p$ adversarial examples or larger perturbations. Thus, CCAT addresses an important drawback of standard adversarial training [].
The code for confidence-calibrated adversarial training is now available on GitHub:
Confidence-Calibrated Adversarial Training on GitHubThe corresponding paper is available on ArXiv; also check out the project page:
Paper on ArXiv@inproceedings{Stutz2020ICML, author = {David Stutz and Matthias Hein and Bernt Schiele}, title = {Confidence-Calibrated Adversarial Training: Generalizing Robustness to Unseen Attacks}, booktitle = {ICML}, year = {2020} }
Features
The repository includes the following implementations and features:
- Training procedures for:
- Adversarial Training []
- Confidence-Calibrated Adversarial Training
- Various white- and black-box adversarial attacks:
- PGD [] with backtracking
- (Reference implementation of PGD without backtracking)
- Corner Search []
- Query Limited [] with backtracking
- ZOO [] with backtracking
- Adversarial Frames []
- Geometry []
- Square []
- Confidence-thresholded evaluation protocol for:
- adverarial examples
- distal adversarial examples
- out-of-distribution examples
- corrupted examples
More features:
- All attacks follow a common interface, allow different objectives, initializations and $L_p$ norms and operate on batches.
- Adversarial training supports any of the included attack and using a variable (for example 100% or 50%) fraction of adversarial examples per batch.
- Confidence-calibrated adversarial training supports any of the included attack, different losses and transition functions.
- Training supports data augmentation through imgaug and custom data loaders.
- Evaluation includes per-example worst-case analysis and multiple restarts per attack.
- Utilities, attacks and training are tested!
References
- [] Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. ICLR, 2018.
- [] Croce, F. and Hein, M. Sparse and imperceivable adversarial attacks. arXiv.org, abs/1909.05040, 2019.
- [] Ilyas, A., Engstrom, L., Athalye, A., and Lin, J. Black-box adversarial attacks with limited queries and information. In ICML, 2018.
- [] Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, Cho-Jui Hsieh. ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models. AISec@CCS, 2017.
- [] Zajac, M., Zolna, K., Rostamzadeh, N., and Pinheiro, P. O. Adversarial framing for image and video classification. In AAAI Workshops, 2019.
- [] Khoury, M. and Hadfield-Menell, D. On the geometry of adversarial examples. arXiv.org, abs/1811.00525, 2018.
- [] Andriushchenko, M., Croce, F., Flammarion, N., and Hein, M. Square attack: a query-efficient black-box adversarial attack via random search. arXiv.org, 1912.00049, 2019.