A. Gordo, J. A. Rodríguez-Serrano, F. Perronnin, E. Valveny. Leveraging category-level labels for instance-level image retrieval. In Computer Vision and Pattern Recognition, Conference on, pages 3045–3052, Providence, Rhode Island, June 2012.

Gordo et al. propose an approach to dimensionality reduction of image representations for image retrieval. Given a training set of image representations extracted for image retrieval, they jointy learn a dimensionality reduction $P \in \mathbb{R}^{C' \times C}$ and a set of classifiers $w_t \in \mathbb{R}^{C'}, t \in \{1,\ldots,T\}$, in order to project the image representations and their labels into a common subspace. A large-margin framework is employed; the energy

$E(P) = \sum_{(x_n, t_n, t) : t \neq t_n} \max\{0, 1 - s(x_n, t_n) + s(x_n, t)\}$ with $s(x_n, t) = (P x_n)^T w_t$

is minimized using stochastic gradient descent. Here, $C$ is the dimension of the image representations and $C'$ the target dimensionality. For $n = 1,\ldots,N$, $(x_n, t_n)$ represents a pair of image representation and label from the training set. Then, $s(x_n, t)$ can be interpreted as the relevance of label $t$ to image representation $x_n$. During minimization, a triple $(x_n, t_n, t)$ with $t_n \neq t$ is sampled and the projection matrix $P$ as well as the classifiers $w_{t_n}$ and $w_t$ are updated only if the loss

$\max\{0, 1 - s(x_n,t_n) + s(x_n, t)\}$

is positive. Then, the update equations derived by Gordo et al. are as follows:

$P[\tau + 1] = P[\tau] + \gamma (w_{t_n}[\tau] - w_t[\tau])x_n^T$
$w_{t_n}[\tau + 1] = w_{t_n}[\tau] + \gamma P[\tau]x_n$
$w_t[\tau + 1] = w_t[\tau] - \gamma P[\tau] x_n$

where $\gamma$ is the learning rate and $\tau$ indexes the iterations. Finally, $P$ is used for dimensionality reduction while the classifiers $w_t$ are discarded.

What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.