Eric Wong, J. Zico Kolter. Provable Defenses against Adversarial Examples via the Convex Outer Adversarial Polytope. ICML, 2018.

Wong and Kolter propose a method for learning provably-robust, deep, ReLU based networks by considering the so-called adversarial polytope of final-layer activations reachable through adversarial examples. Overall, the proposed approach has some similarities to adversarial training in that the overall objective can be written as

$\min_\theta \sum_{i = 1}^N \max_{\|\Delta\|_\infty \leq \epsilon} L(f_\theta(x_i + \Delta), y_i)$.

However, in contrast to previous work, the inner maximization problem (i.e. finding the optimal adversarial example to train on) can be avoided by considering the so-called “dual network” $g_\theta$ (note that the parameters $\theta$ are the same for $f$ and $g$):

$\min_\theta \sum_{i = 1}^N L(-J_\epsilon(x_i, g_\theta(e_{y_i}1^T – I)), y_i)$

where $e_{y_i}$ is the one-hot vector of class $y_i$ and $\epsilon$ the maximum perturbation allowed. Both the network $g_\theta$ and the objective $J_\epsilon$ are derive from the dual problem of a linear program corresponding to the adversarial perturbation objective.

Considering $z_k$ to be the activations of the final layer (e.g., the logits), a common objective for adversarial examples is

$\min_{\Delta} z_k(x + \Delta)_{y} – z_k(x + \Delta)_{y_{\text{target}}}$

with $x$ being a sample, and $y$ being true or target label. Wong and Kolter show that this can be rewritten as a linear program:

$\min_{\Delta} c^T z_k(x + \Delta)$.

Instead of minimizing over the perturbation $\Delta$, we can also optimize over the activations $z_k$ itself. We can even constrain the activations to a set $\tilde{Z}_\epsilon(x)$ around a sample $x$ in which we want the network’s activations to be robust. In the paper, this set is obtained using a convex relaxation of the ReLU network, where it is assumed that upper on lowe rbounds of all activations can be computed efficiently. The corresponding dual problem then involves

$\max_\alpha J_\epsilon(x, g_\theta(x, \alpha))$

with $g_\theta$ being the dual network. Details can be found in the paper; however, I wanted to illustrate the general idea. Because of the simple structure of the network, the dual network is almost idenitical to the true network and the required upper and lower bounds can be computed in a backward style computation.

In experiments, Wong and Kolter demonstrate that they can compute reasonable robustness guarantees for simple ReLu networks on MNIST. They also show that the classification boundaries of the learned networks are smoother; however, these experiments have only been conducted on simple 2D toy datasets (with significantly larger networks compared to MNIST).

Also find this summary on ShortScience.org.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.