Folz et al. propose an auto-encoder based defense against adversarial examples. In particular, they propose structure-to-signal auto-encoders, S2SNets, as defense mechanism – this auto-encoder is first trained in an unsupervised fashion to reconstruct images (which can be done independent of attack models or the classification network under attack). Then, the network’s decoder is fine tuned using gradients from the classification network. Their main argumentation is that the gradients of the composite network – auto-encoder plus classification network – are not class specific anymore as only the decoder is fine-tuned but not the encoder (as the encoder is trained to encode any image independent of the classification task). Experimentally they show that the gradients are indeed less class-specific. Additionally, the authors highlight that this defense is independent of an attack model and can be applied to any pre-trained classification model. Unforutntely, the approach is not compared against other defense machenisms – while related work was mentioned, a comparison would have been useful.
What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: