# DAVIDSTUTZ

Check out our latest research on weakly-supervised 3D shape completion.
17thAPRIL2017

M. Noroozi, P. Favaro. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles. ECCV,2016.

Noroozi and Favaro present a self-supervised learning task similar to the one proposed by Doersch et al. []. Specifically, they use Jigsaw puzzles to teach convolutional neural networks context and learn features useful for classification and detection. The overall idea is illustrated in Figure 1.

The presented architecture is shown in Figure 2 and consists of $9$ AlexNets [] with shared weights. The final computed/learned representations are fed into two fully connected layers and then to a softmax layer with $64$ outputs. The $64$ different possibilities correspond to one of $64$ different permutations used for the input tiles.

They demonstrate the usefulness of the learned representations on ImageNet and Pascal VOC 2007. They also present an intuitive visualization. To this end, they compute the $L_1$ norm of feature maps in specific layers and present the top 16 patches (from different) images with largest $L_1$ norm. This illustrates that specific feature maps in specific layer correspond to individual semantic concepts. The visualizations are shown in Figure 3.

• [] C. Doersch, A. Gupta, A. A. Efros: Unsupervised visual representation learning by context prediction. ICCV, 2015.
• [] A. Krizhevsky, I. Sutskever, G. E. Hinton. Imagenet classification with deep convolutional neural networks. NIPS, 2012.

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below or get in touch with me: