Galloway et al. argue that batch normalization reduces robustness against noise and adversarial examples. On various vision datasets, including SVHN and ImageNet, with popular self-trained and pre-trained models they empirically demonstrate that networks with batch normalization show reduced accuracy on noise and adversarial examples. As noise, they consider Gaussian additive noise as well as different noise types included in the Cifar-C dataset. Similarly, for adversarial examples, they consider $L_\infty$ and $L_2$ PGD and BIM attacks; I refer to the paper for details and hyper parameters. On noise, all networks perform worse with batch normalization, even though batch normalization increases clean accuracy slightly. Against PGD attacks, the provided experiments also suggest that batch normalization reduces robustness; however, the attacks only include 20 iterations and do not manage to reduce the adversarial accuracy to near zero, as is commonly reported. Thus, it is questionable whether batch normalization makes indeed a significant difference regarding adversarial robustness. Finally, the authors argue that replacing batch normalization by weight decay can recover some of the advantage in terms of accuracy and robustness.
What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: