Geirhos et al. show that state-of-the-art convolutional neural networks put too much importance on texture information. This claim is confirmed in a controlled study comparing convolutional neural network and human performance on variants of ImageNet image with removed texture (silhouettes) or on edges. Additionally, networks only considering local information can perform nearly as well as other networks. To avoid this bias, they propose a stylized ImageNet variant where textured are replaced randomly, forcing the network to put more weight on global shape information.
What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: