Sayantan Sarkar, Ankan Bansal, Upal Mahbub, Rama Chellappa. UPSET and ANGRI : Breaking High Performance Image Classifiers. CoRR abs/1707.01159 (2017).

Sarkar et al. propose two “learned” adversarial example attacks, UPSET and ANGRI. The former, UPSET, learns to predict universal, targeted adversarial examples. The latter, ANGRI, learns to predict (non-universal) targeted adversarial attacks. For UPSET, a network takes the target label as input and learns to predict a perturbation, which added to the original image results in mis-classification; for ANGRI, a network takes both the target label and the original image as input to predict a perturbation. These networks are then trained using a mis-classification loss while also minimizing the norm of the perturbation. To this end, the target classifier needs to be differentiable – i.e., UPSET and ANGRI require white-box access.

Also find this summary on ShortScience.org.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.