Tramer et al. study adversarial subspaces, subspaces of the input space that are spanned by multiple, orthogonal adversarial examples. This is achieved by iteratively searching for orthogonal adversarial examples, relative to a specific test example. This can, for example, be done using classical second- or first-order optimization methods for finding adversarial examples with the additional constraint of finding orthogonal adversarial examples. However, the authors also consider different attack strategies that work on discrete input features. In practice, on MNIST, this allows to find, on average, 44 orthogonal directions per test example. This finding indicates that adversarial examples indeed span large adversarial subspaces. Additionally, adversarial examples from the subspaces seem to transfer reasonably well to other models. The remainder of the paper links this ease of transferability to a similarity in decision boundaries learnt by different models from the same hypotheses set.
What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: