Mitali Bafna, Jack Murtagh, Nikhil Vyas. Thwarting Adversarial Examples: An L_0-Robust Sparse Fourier Transform. NeurIPS 2018: 10096-10106.

Bafna et al. show that iterative hard thresholding results in $L_0$ robust Fourier transforms. In particular, as shown in Algorithm 1, iterative hard thresholding assumes a signal $y = x + e$ where $x$ is assumed to be sparse, and $e$ is assumed to be sparse. This translates to noise $e$ that is bounded in its $L_0$ norm, corresponding to common adversarial attacks such as adversarial patches in computer vision. Using their algorithm, the authors can provably reconstruct the signal, specifically the top-$k$ coordinates for a $k$-sparse signal, which can subsequently be fed to a neural network classifier. In experiments, the classifier is always trained on sparse signals, and at test time, the sparse signal is reconstructed prior to the forward pass. This way, on MNIST and Fashion-MNIST, the algorithm is able to recover large parts of the original accuracy.

Algorithm 1 (see paper for details): The iterative hard thresholding algorithm resulting in provable robustness against $L_0$ attack on images and other signals.

Also find this summary on ShortScience.org.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.