Alhussein Fawzi, Omar Fawzi, Pascal Frossard. Fundamental limits on adversarial robustness. ICML Workshop on Deep Learning, 2015.

Fawzi et al. provide upper bounds on the robustness of linear and quadratic classifiers. As many modern neural networks can be seen as piece-wise linear functions (using convolutions, $\text{ReLU}$ activations and max pooling), I found the part of linear functions most interesting. Specifically, robustness is defined as the expected smallest perturbations needed to change the label of a sample (computed as the average of smallest perturbations for each sample in a set). Motivated by a toy example (which I found quite artificial and not very practical), the authors show that the robustness $\rho(f)$ of the classifier $f$ is bounded as follows:

$\rho(f) \leq \frac{1}{2}\|\mathbb{E}_{\mu_1}[x] - \mathbb{E}_{\mu_{-1}}[x]\|_2 + 2MR(f)$

where $\mu_1$ and $\mu_{-1}$ are the data distributions of classes $1$ and $-1$, all samples are bounded by $\|x\|_2 \leq M$ and $R(f)$ is the risk of the classifier. Here, the authors assume that the binary classification task is balanced. Personally, the exact form is of less importance; however, the two parts of the bound are interesting – i.e. the rist on the one hand and the separability of the distributions on the other hand. This presents a dilemma as minimizing risk will also minimize robustness …

Fawzi et al. derive a similar bound for quadratic classifiers. The bound has the same parts, only that the “separability” of distributions is not expressed in terms of means, but in terms of second-order statistics. This might make quadratic classifiers more robust, but the dilemma from above remains.

What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.