Lamb et al. introduce fortified networks with denoising auto encoders as hidden layers. These denoising auto encoders are meant to learn the manifold of hidden representations, project adversarial input back to the manifold and improve robustness. The main idea is illustrated in Figure 1. The denoising auto encoders can be added at any layer and are trained jointly with the classification network – either on the original input, or on adversarial examples as done in adversarial training.
Figure 1: Illustration of a fortified layer, i.e., a hidden layer that is reconstructed through a denoising auto encoder as defense mechanism. The denoising auto encoders are trained jointly with the network.
In experiments, they show that the proposed defense mechanism improves robustness on MNIST and CIFAR, compared to adversarial training and other baselines. The improvements are, however, very marginal. Especially, as the proposed method imposes an additional overhead (in addition to adversarial training).