Another alternative to the regular $L_p$-constrained adversarial examples that is additionally less visible than adversarial patches or frames are adversarial transformations such as small crops, rotations and translations. Similar to $L_p$ adversarial examples, adversarial transformations are often less visible unless the original image is available for direct comparison. In this article, I will include a PyTorch implementation and some results against adversarial training.
Adversarial patches and frames are an alternative to the regular $L_p$-constrained adversarial examples. Often, adversarial patches are thought to be more realistic — mirroring graffitis or stickers in the real world. In this article I want to discuss a simple PyTorch implementation and present some results of adversarial patches against adversarial training as well as confidence-calibrated adversarial training.
Out-of-distribution examples are images that are cearly irrelevant to the task at hand. Unfortunately, deep neural networks frequently assign random labels with high confidence to such examples. In this article, I want to discuss an adversarial way of computing high-confidence out-of-distribution examples, so-called distal adversarial examples, and how confidence-calibrated adversarial training handles them.
Properly evaluating defenses against adversarial examples has been difficult as adversarial attacks need to be adapted to each individual defense. This also holds for confidence-calibrated adversarial training, where robustness is obtained by rejecting adversarial examples based on their confidence. Thus, regular robustness metrics and attacks are not easily applicable. In this article, I want to discuss how to evaluate confidence-calibrated adversarial training in terms of metrics and attacks.
Taking adversarial training from this previous article as baseline, this article introduces a new, confidence-calibrated variant of adversarial training that addresses two significant flaws: First, trained with L∞ adversarial examples, adversarial training is not robust against L2 ones. Second, it incurs a significant increase in (clean) test error. Confidence-calibrated adversarial training addresses these problems by encouraging lower confidence on adversarial examples and subsequently rejecting them.
Series of articles discussing adversarial robustness and adversarial training in PyTorch.
Knowing how to compute adversarial examples from this previous article, it would be ideal to train models for which such adversarial examples do not exist. This is the goal of developing adversarially robust training procedures. In this article, I want to describe a particularly popular approach called adversarial training. The idea is to train on adversarial examples computed during training on-the-fly. I will also discuss a PyTorch implementation that obtains 47.9% robust test error — 52.1% robust accuracy — on CIFAR10 using a WRN-28-10 architecture.
Adversarial examples, slightly perturbed images causing mis-classification, have received considerable attention over the last few years. While many different adversarial attacks have been proposed, projected gradient descent (PGD) and its variants is widely spread for reliable evaluation or adversarial training. In this article, I want to present my implementation of PGD to generate L∞, L2, L1 and L0 adversarial examples. Besides using several iterations and multiple attempts, the worst-case adversarial example across all iterations is returned and momentum as well as backtracking strengthen the attack.
Since I worked on confidence-calibrated training (CCAT) some years ago, CCAT has been evaluated using novel attacks. In this article, I want to share some updated results and numbers and contrast the reported numbers with newer experiments that I ran.
In March this year I finally submitted my PhD thesis and successfully defended in July. Now, more than 6 months later, my thesis is finally available in the university’s library. During my PhD, I worked on various topics surrounding robustness and uncertainty in deep learning, including adversarial robustness, robustness to bit errors, out-of-distribution detection and conformal prediction. In this article, I want to share my thesis and give an overview of its contents.