Wieland Brendel, Matthias Bethge. Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet. CoRR abs/1904.00760 (2019).

Brendel and Bethge show empirically that state-of-the-art deep neural networks on ImageNet rely to a large extent on local features, without any notion of interaction between them. To this end, they propose a bag-of-local-features model by applying a ResNet-like architecture on small patches of ImageNet images. The predictions of these local features are then averaged and a linear classifier is trained on top. Due to the locality, this model allows to inspect which areas in an image contribute to the model’s decision, as shown in Figure 1. Furthermore, these local features are sufficient for good performance on ImageNet. Finally, they show, on scrambled ImageNet images, that regular deep neural networks also rely heavily on local features, without any notion of spatial interaction between them.

Figure 1: Illustration of the heap maps obtained using BagNets, the bag-of-local-features model proposed in the paper. Here, different sizes for the local patches are used.

What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.