IAM

JUNE2018

READING

Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy. Explaining and Harnessing Adversarial Examples. CoRR abs/1412.6572, 2014.

Goodfellow et al. introduce the fast gradient sign method (FGSM) to craft adversarial examples and further provide a possible interpretation of adversarial examples considering linear models. FGSM is a grdient-based, one step method for generating adversarial examples. In particular, letting $J$ be the objective optimized during training and $\epsilon$ be the maximum $\infty$-norm of the adversarial perturbation, FGSM computes

$x' = x + \eta = x + \epsilon \text{sign}(\nabla_x J(x, y))$

where $y$ is the label for sample $x$. The $\text{sign}$ method is applied element-wise here. The applicability of this method is shown in several examples and it is commonly used in related work.

In the remainder of the paper, Goodfellow et al. discuss a linear interpretation of why adversarial examples exist. Specifically, considering the dot product

$w^T x' = w^T x + w^T \eta$

it becomes apparent that the perturbation $\eta$ – although insignificant on a per-pixel level (i.e. smaller than $\epsilon$) – causes the activation of a single neuron to be influence significantly. What is more, this effect is more pronounced the higher the dimensionality of $x$. Additionally, many network architectures today use $\text{ReLU}$ activations, which are essentially linear.

Goodfellow et al. conduct several more experiments; I want to highlight the conclusions of some of them:

  • Training on adversarial samples can be seen as regularization. Based on experiments, it is more effective than $L_1$ regularization or adding random noise.
  • The direction of the perturbation matters most. Adversarial samples might be transferable as similar models learn similar functions where these directions are, thus, similarly effective.
  • Ensembles are not necessarily resistant to perturbations.
Also find this summary on ShortScience.org.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.