Properly evaluating defenses against adversarial examples has been difficult as adversarial attacks need to be adapted to each individual defense. This also holds for confidence-calibrated adversarial training, where robustness is obtained by rejecting adversarial examples based on their confidence. Thus, regular robustness metrics and attacks are not easily applicable. In this article, I want to discuss how to evaluate confidence-calibrated adversarial training in terms of metrics and attacks.
Taking adversarial training from this previous article as baseline, this article introduces a new, confidence-calibrated variant of adversarial training that addresses two significant flaws: First, trained with L∞ adversarial examples, adversarial training is not robust against L2 ones. Second, it incurs a significant increase in (clean) test error. Confidence-calibrated adversarial training addresses these problems by encouraging lower confidence on adversarial examples and subsequently rejecting them.
With our paper on conformal training, we showed how conformal prediction can be integrated into end-to-end training pipelines. There are so many interesting directions of how to improve and build upon conformal training. Unfortunately, I just do not have the bandwidth to pursue all of them. So, in this article, I want to share some research ideas so others can pick them up.
In March this year I finally submitted my PhD thesis and successfully defended in July. Now, more than 6 months later, my thesis is finally available in the university’s library. During my PhD, I worked on various topics surrounding robustness and uncertainty in deep learning, including adversarial robustness, robustness to bit errors, out-of-distribution detection and conformal prediction. In this article, I want to share my thesis and give an overview of its contents.