# DAVIDSTUTZ

Check out our CVPR'18 paper on weakly-supervised 3D shape completion — and let me know your opinion! @david_stutz
05thAPRIL2016

H. Jégou, M. Douze, C. Schmid. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1):117–128, 2011.

In image retrieval systems, nearest-neighbor search is used to retrieve relevant images corresponding to a specific query image. As feature spaces tend to be high-dimensional, Jégou et al. propose product quantization for approximate nearest-neighbor search. In general, a quantizer $q$ is a function mapping each image representation to one of $M$ centroids. Essentially, a quantizer tries to reconstruct a given database by $M$ representatives and, thus, the reconstruction error can be expressed as

$MSE(q) = \int p(x_n)d(x_n, q(x_n))^2 d x_n \approx \sum_{n = 1}^N d(x_n, q(x_n))^2$.(1)

Product quantization subdivides each image representation into subvectors and each subvector is quantized separately using $k$-means clustering. The advantage of the product quantization lies in reduced memory consumption. In particular, $k$-means clustering requires to store $\mathcal{O}(MC)$ floating point values, while product quantization stores $\mathcal{O}(QM^\ast C) = \mathcal{O}(M^{\frac{1}{Q}}C)$ with $Q$ being the number of quantizers and $M = (M^\ast)^Q$. Based on the above quantization, Jégou et al. propose two approaches to approximate search within the quantized image representations. Given a query image representation $z_0$, using symmetric distance computation, the distance $d(x_n, z_0)$ (e.g. Euclidean, Manhatten or similar) is approximated by

$d(x_n, z_0) \approx \hat{d}(x_n, z_0) = d(q(x_n), q(z_0))$

In contrast, asymmetric distance computation is given as

$\hat{d}(x_n, z_0) = d(x_n, q(z_0))$

Jégou et al. provide guarantees on the error of these approximations: the distance error of symmetric distance computation is statistically bounded by $MSE(q)$, while the distance error of asymmetric distance computation is statistically bounded by $2MSE(q)$. In practice, both bounds can be computed using Equation (1).

The above approach is still based on exhaustive search. Jégou et al. use an inverted filesystem to circumventexhaustive search. Before using product quantization, the image representations are first quantized into a coarse quantization (with the number of centroids being significantly smaller compared to the product quantization). Each image representation is stored in a list assigned to the corresponding centroid of the coarse quantization. Instead of exhaustive search, only one of these lists is searched for nearest neighbors.

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below or get in touch with me: