Goldblum et al. show that distilling robustness is possible, however, depends on the teacher model and the considered dataset. Specifically, while classical knowledge distillation does not convey robustness against adversarial examples, distillation with a robust teacher model might increase robustness of the student model – even if trained on clean examples only. However, this seems to depend on both the dataset as well as the teacher model, as pointed out in experiments on Cifar100. Unfortunately, from the paper, it does not become clear in which cases robustness distillation does not work. To overcome this limitation, the authors propose to combine adversarial training and distillation and show that this recovers robustness; the student model’s robustness might even exceed the teacher model’s robustness. This, however, might be due to the additional adversarial examples used during distillation.
What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: