COMPUTERVISION RESEARCHSCIENTIST

RESEARCHSCIENTIST

21^{th}AUGUST2016

J. Mairal, F. Bach, J. Ponce. *Sparse Modeling for Image and Vision Processing*. Foundations and Trends in Computer Graphics and Vision, colume 8, 2014.

What is **your opinion** on the summarized work? Or do you know related work that is of interest? **Let me know** your thoughts in the comments below or get in touch with me:

Mairal, Bach and Ponce provide a profound discussion of sparse coding for computer vision. Beneath discussing data pre-processing and data analysis methods such as PCA and ICA, they also provide detailed descriptions of sparse coding and dictionary learning algorithms including their applications such as denoising, inpainting, super-resolution and recognition.

Personally, I was most interested in data pre-processing as well as gradient-descent based sparse-coding and dictionary learning approaches.

Data pre-processing.Mairal, Bach and Ponce discuss the following pre-processing techniques: centering, normalization and whitening.$x_i := x_i - \frac{1}{N}\sum_{i = 1}^N x_i$

When performed on overlapping patches, every pixel occurs in multiple patches. Therefore, the corresponding values are averaged. According to Mairal, Bach and Ponce this has a similar effect as high-pass filtering the image.$x_i := \frac{x_i}{\|x_i\|_2}$

may give poor results when applied on image patches. Especially after centering, some patches may contain few information resulting in a low $l_2$ norm. Therefore, Mairal, Bach and Ponce use$x_i := \frac{x_i}{\max(\eta, x_i)}$

with $\eta$ chosen as $0.2$ times the mean $l_2$ norm across all patches. On overlapping patches, the same approach as above is employed.$\Sigma = \frac{1}{N}\sum_{i = 1}^N x_i x_i^T$,

whitening intends to transform the image patches such that the covariance matrix is close to the identity matrix. As the covariance is positive semi-definite, the eigenvalues are non-negative and $S = \text{diag}(s_1,\ldots, s_n)$ of the eigenvalue decomposition$\Sigma = U S^2 U^T$

holds the singular values (note that $U$ is orthogonal). Whitening is the performed as$ x_i := U S' U^T$

with $S' = \text{diag}(s_1', \ldots, s_n')$ and$s_i' = \begin{cases}\frac{1}{s_i} &\quad \text{ if } |s_i| > \epsilon\\ 0 &\quad \text{ otherwise }\end{cases}$

for some $\epsilon > 0$.For applying the discussed pre-processing techniques to color patches, Mairal, Bach and Ponce make the following suggestions:

Sparse coding.Mairal, Bach and Ponce discuss several sparse coding approaches with respect to both the $l_0$ norm and the $l_1$ norm, i.e. considering the problems$\min_{\alpha \in \mathbb{R}^m} \|x - D\alpha\|_2^2 + \lambda \|\alpha\|_0$(1)

$\min_{\alpha \in \mathbb{R}^m} \|x - D\alpha\|_2^2 + \lambda \|\alpha\|_1$(2)

I was most interested in gradient descent approaches to both problems (however, they also discuss approaches such as matching pursuit, orthogonal matching pursuit etc.). For Equation (1) this results in hard thresholding as illustrated in Algorithm 1; for Equation (2), soft thresholding is illustrated in Algorithm 2.

Algorithm 1: Hard thresholding for sparse coding.

Algorithm 2: Soft thresholding for sparse coding.

Dictionary learning.For dictionary learning, Mairal, Bach and Ponce discuss several approaches including K-SVD, block coordinate descent and alternating minimization. Again, I focussed on a (stochastic) gradient descent approach addressing the problem$\min_{D \in \mathbb{R}^{n \times m}} \frac{1}{N}\sum_{i = 1}^N \min_{\alpha \in \mathbb{R}^m} \frac{1}{2} \| x_i - D \alpha\|_2^2 + \lambda \|\alpha\|_1$

Alternatively, the $l_1$ norm can be replaced by the $l_0$ norm. The resulting algorithm is shown in Algorithm 3 where the last step in each iteration describes a projection of $D$ onto the set of matrices with $l_2$ norm less than one. This prevents the dictionary columns to grow out of bounds.

Algorithm 3: Stochastic gradient descent for dictionary learning.