Liu and Hsieh combine adversarial training and generative adversarial networks (GANs) to obtain AdvGAN, a model allowing to improve generative performance as well as classification robustness. In particular, as illustrated in Figure 1, they propose a two-step approach. In the first step, an AC-GAN is trained (a GAN, where in addition to real/fake classification, the discriminator also learned to solve a classification problem on the images). However, instead of allowing the discriminator access to real images, adversarial examples of these real images are used. The adversarial examples try to maximize the discriminators loss regarding the classification task. In a second step, the discriminator is fine-tuned with respect to the classification task on both real and fake images.
AdvGAN is motivated through two key observations. First, Liu and Hsieh noticed a generalization gap in adversarial training. In specific, the generalization gap, as illustrated in Figure 2, specifies the gap betwee accuracy on the training set and on the testing set of an adversarially trained model. The main observation is that, considering increasing strength of the attack, the gap between training and testing accuracy changes. However, I want to note, that it doesn’t increase continuously. The second idea is that a robust discriminator might improve generative performance. In particular, they argue that a robust discriminator will lead to more significant updates of the generator (see the paper).
Figure 2: Generalization gap as observed by the authors. The yellow line is the accuracy on the training set; the blue line is the accuracy on the test set. The $x$-axis describes the attack strength (i.e., $\epsilon$ of attack). The generalization gap quickly increases for small $\epsilon$, but also reduces again later.
In experiments, the authors show that the generative performance improves (with regard to the baseline GAN) and the generalization gap decreases.