Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu. Towards Deep Learning Models Resistant to Adversarial Attacks. CoRR abs/1706.06083, 2017.

Madry et al. provide an interpretation of training on adversarial examples as saddle-point (i.e. min-max) problem. Based on this formulation, they conduct several experiments on MNIST and CIFAR-10 supporting the following conclusions:

  • Projected gradient descent might be “strongest” adversary using first-order information. Here, gradient descent is used to maximize the loss of the classifier directly while always projecting onto the set of “allowed” perturbations (e.g. within an $\epsilon$-ball around the samples). This observation is based on a large number of random restarts used for projected gradient descent. Regarding the number of restarts, the authors also note that an adversary should be bounded regarding the computation resources – similar to polynomially bounded adversaries in cryptography.
  • Network capacity plays an important role in training robust neural networks using the min-max formulation (i.e. using adversarial training). In particular, the authors suggest that increased capacity is needed to fit/learn adversarial examples without overfitting. Additionally, increased capacity (in combination with a strong adversary) decreases transferability of adversarial examples.
Also find this summary on ShortScience.org.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.