26thJUNE2018

Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, Patrick D. McDaniel. Ensemble Adversarial Training: Attacks and Defenses. CoRR abs/1705.07204, 2017.

$x' = x + \alpha \text{sign}(\mathcal{N}(0, I))$
$x'' = x' + (\epsilon - \alpha)\text{sign}(\nabla_{x'} J(x', y))$
where $J$ is the loss function and $y$ the label corresponding to sample $x$. In experiments, they show that the attack has higher success rates on adversarially trained models.