I am looking for full-time (applied) research opportunities in industry, involving (trustworthy and robust) machine learning or (3D) computer vision, starting early 2022. Check out my CV and get in touch on LinkedIn!


T. Ge, Q. Ke, J, Sun. Sparse-coded features for image retrieval. In British Machine Vision Conference, Bristol, United Kingdom, September 2013.

Ge et al. propose a sparse-coding approach to image retrieval. Similar to other approaches (e.g. [1]), they compute a vocabulary of visual words $\hat{Y} = \{\hat{y}_1,\ldots,\hat{y}_M\}$ from the extracted descriptors of all $N$ images $Y = \bigcup_{n = 1}^N Y_n$ and apply sparse coding as embedding:

$f(y_{l,n}) = \text{arg}\min_{r_l} \|y_{l,n} - \hat{Y} r_l\|_2^2 + \lambda \|r_l\|_1$.

where $\lambda$ is a regularization parameter and $r_l$ is the sparse code computed for descriptor $y_{l,n}$. As second step, these sparse codes are pooled into a single $M$-dimensional feature vector. Max pooling is given by

$F(Y_n) = \left(\max_{1\leq l \leq L}\{f_1(y_{l,n})\},\ldots,\max_{1\leq l \leq L}\{f_M(y_{l,n})\}\right)$

where $f_m(y_{l,n})$ refers to the $m$-th components of $f(y_{l,n})$. The final image representation is $L_2$-normalized.

  • [1] J. Sivic, A. Zisserman. Video google: A text retrieval approach to object matching in videos. In Computer Vision, International Conference on, pages 1470–1477, Nice, France, October 2003.

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: