IAM

I will be presenting our work on adversarial robustness at ICML'19 and CVPR'19 in Long Beach beginning next week!
29thMARCH2019

READING

Amirata Ghorbani, Abubakar Abid, James Y. Zou. Interpretation of Neural Networks is Fragile. CoRR abs/1710.10547 (2017).

Ghorbani et al. show that neural network visualization techniques, often introduced to improve interpretability, are susceptible to adversarial examples. For example, they consider common feature-importance visualization techniques and aim to find an advesarial example that does not change the predicted label but the original interpretation – e.g., as measured on some of the most important features. Examples of the so-called top-1000 attack where the 1000 most important features are changed during the attack are shown in Figure 1. The general finding, i.e., that interpretations are not robust or reliable, is definitely of relevance for the general acceptance and security of deep learning systems in practice.

Figure 1: Examples of changed interpretations.

Also find this summary on ShortScience.org.

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: