Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu. Towards Deep Learning Models Resistant to Adversarial Attacks. CoRR abs/1706.06083, 2017.

Madry et al. provide an interpretation of training on adversarial examples as saddle-point (i.e. min-max) problem. Based on this formulation, they conduct several experiments on MNIST and CIFAR-10 supporting the following conclusions:

  • Projected gradient descent might be “strongest” adversary using first-order information. Here, gradient descent is used to maximize the loss of the classifier directly while always projecting onto the set of “allowed” perturbations (e.g. within an $\epsilon$-ball around the samples). This observation is based on a large number of random restarts used for projected gradient descent. Regarding the number of restarts, the authors also note that an adversary should be bounded regarding the computation resources – similar to polynomially bounded adversaries in cryptography.
  • Network capacity plays an important role in training robust neural networks using the min-max formulation (i.e. using adversarial training). In particular, the authors suggest that increased capacity is needed to fit/learn adversarial examples without overfitting. Additionally, increased capacity (in combination with a strong adversary) decreases transferability of adversarial examples.
Also find this summary on ShortScience.org.

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: