Hadi Salman, Jerry Li, Ilya P. Razenshteyn, Pengchuan Zhang, Huan Zhang, S├ębastien Bubeck, Greg Yang. Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers. NeurIPS 2019: 11289-11300.

Salman et al. combined randomized smoothing with adversarial training based on an attack specifically designed against smoothed classifiers. Specifically, they consider the formulation of randomized smoothing by Cohen et al. [1]; here, Gaussian noise around the input (adversarial or clean) is sampled and the classifier takes a simple majority vote. In [1], Cohen et al. show that this results in good bounds on robustness. In this paper, Salman et al. propose an adaptive attack against randomized smoothing. Essentially, they use a simple PGD attack to attack a smoothed classifier, i.e., maximize the cross entropy loss of the smoothed classifier. To make the objective tractable, Monte Carlo samples are used in each iteration of the PGD optimization. Based on this attack, they do adversarial training, with adversarial examples computed against the smoothed (and adversarially trained) classifier. In experiments, this approach outperforms the certified robustness by Cohen et al. on several datasets.

  • [1] Jeremy M. Cohen, Elan Rosenfeld and J. Zico Kolter. Certified Adversarial Robustness via Randomized Smoothing. ArXiv, 1902.02918, 2019.
Also find this summary on ShortScience.org.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.