Elad Hoffer, Itay Hubara, Nir Ailon. Deep unsupervised learning through spatial contrasting. CoRR abs/1610.00243, 2016.

Hoffer et al. Propose a unsupervised (or more precise self-supervised) training methodology for deep neural networks. Their work is in line with other work trying to learn representation sin a self-supervised fashion. Given an image, the proposed approach called spatial contrasting, takes to patches from the image (one anchor patch) and a random additional patch together with a "contrasting" patch from another image. Then, the goal is to simultaneously maximize the conditional probability $p(f_{anchor}|f_{positive})$ and minimize $p(f_{anchor}|f_{negative}) where $f$ denotes the features computed for the anchor patch, the positive patch and the negative patch. The loss is formed as

$L_{SC}(x_1,x_2) = -\log\frac{\exp(-\|f_1^{(1)} - f_1^{(2)}\|_2}{\exp(-\|f_1^{(1)} - f_1^{(2)}\|2 + \exp(-\|f_1^{(1)} - f_2^{(1)}\|2}$

As this loss is symmetric with regard to the anchor patch nd the positive patch, the loss used for training is:

$\hat{L}_{SC} (x_1, x_2) = \frac{1}{2}[L_{SC}(x_1, x_2) + L_{SC}(x_2, x_1)]$

This approach is also illustrated in Figure 1.

Figure 1: Illustration of the approach.

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: