Elad Hoffer, Itay Hubara, Nir Ailon. Deep unsupervised learning through spatial contrasting. CoRR abs/1610.00243, 2016.

Hoffer et al. Propose a unsupervised (or more precise self-supervised) training methodology for deep neural networks. Their work is in line with other work trying to learn representation sin a self-supervised fashion. Given an image, the proposed approach called spatial contrasting, takes to patches from the image (one anchor patch) and a random additional patch together with a "contrasting" patch from another image. Then, the goal is to simultaneously maximize the conditional probability $p(f_{anchor}|f_{positive})$ and minimize $p(f_{anchor}|f_{negative}) where $f$ denotes the features computed for the anchor patch, the positive patch and the negative patch. The loss is formed as

$L_{SC}(x_1,x_2) = -\log\frac{\exp(-\|f_1^{(1)} - f_1^{(2)}\|_2}{\exp(-\|f_1^{(1)} - f_1^{(2)}\|2 + \exp(-\|f_1^{(1)} - f_2^{(1)}\|2}$

As this loss is symmetric with regard to the anchor patch nd the positive patch, the loss used for training is:

$\hat{L}_{SC} (x_1, x_2) = \frac{1}{2}[L_{SC}(x_1, x_2) + L_{SC}(x_2, x_1)]$

This approach is also illustrated in Figure 1.

Figure 1: Illustration of the approach.

What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.