T. Ge, Q. Ke, J, Sun. Sparse-coded features for image retrieval. In British Machine Vision Conference, Bristol, United Kingdom, September 2013.

Ge et al. propose a sparse-coding approach to image retrieval. Similar to other approaches (e.g. [1]), they compute a vocabulary of visual words $\hat{Y} = \{\hat{y}_1,\ldots,\hat{y}_M\}$ from the extracted descriptors of all $N$ images $Y = \bigcup_{n = 1}^N Y_n$ and apply sparse coding as embedding:

$f(y_{l,n}) = \text{arg}\min_{r_l} \|y_{l,n} - \hat{Y} r_l\|_2^2 + \lambda \|r_l\|_1$.

where $\lambda$ is a regularization parameter and $r_l$ is the sparse code computed for descriptor $y_{l,n}$. As second step, these sparse codes are pooled into a single $M$-dimensional feature vector. Max pooling is given by

$F(Y_n) = \left(\max_{1\leq l \leq L}\{f_1(y_{l,n})\},\ldots,\max_{1\leq l \leq L}\{f_M(y_{l,n})\}\right)$

where $f_m(y_{l,n})$ refers to the $m$-th components of $f(y_{l,n})$. The final image representation is $L_2$-normalized.

  • [1] J. Sivic, A. Zisserman. Video google: A text retrieval approach to object matching in videos. In Computer Vision, International Conference on, pages 1470–1477, Nice, France, October 2003.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.