Pérolat et al. propose a game-theoretic variant of adversarial training on universal adversarial perturbations. In particular, in each training iteration, the model is trained for a specific number of iterations on the current training set. Afterwards, a universal perturbation is found (and the corresponding test images) that fools the network. The found adversarial examples are added to the training set. In the next iteration, the network is trained on the new training set which includes adversarial examples. Overall, this leads to a network being trained on a sequence of universal adversarial perturbations corresponding to earlier versions of that network.
What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: