IAM

OPENSOURCEFAN STUDYING
STUDYINGCOMPUTERSCIENCEANDMATH COMPUTERSCIENCE

Check out the latest superpixel benchmark — Superpixel Benchmark (2016) — and let me know your opinion! @david_stutz
12thMAY2017

READING

Anh Mai Nguyen, Jason Yosinski, Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. CVPR, 2015.

Nguyen et al. are studying how to fool deep neural networks, specifically trained on MNIST or ImageNet, using evolutionary algorithms. In the easier case the evolutionary algorithm used chooses a sample from the population (here, the images) and randomly mutates it. If the generated sample gets a higher fitness value than the current champion, the champion is replaced by the new sample. The fitness function is the highest score predicted by the deep neural network over all classes. The champion is the sample with the highest fitness function. Details can be found in the paper.

They that using these evolutionary algorithms, it is possible to produce irregular images with very high confidence scores for networks trained on the MNIST dataset, see Figure 1. They attribute this property to the small training set. Indeed, on the ImageNet it is harder for the evolutionary algorithm to produce irregular images with high confidence. Therefore, on ImageNet, by changing the way samples are randomly mutated, they try to produce regular images fooling the deep neural network. Again, the evolutionary algorithm can easily generate images fooling the network into high confidences.

Figure 1 (click to enlarge): Irregular images which a deep network trained on MNIST classifies with 99.9% confidence as digits between $0$ and $9$

An interesting experiment is whether a network can be trained to avoid "being fooled". However, Nguyen et al. simply add a "fooling images" class to the loss function. Although by adding fooling images it gets harder for the evolutionary algorithms to generate fooling images for the ImageNet network, the network simply learns features to recognize these generated fooling images. This means that using different evolutionary algorithms or random mutations it might still be possible to fool the network. It also means that the features for classifying the actual images from the dataset did not improve to discard images not form the class ...

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below or using the following platforms: