# DAVIDSTUTZ

SEPTEMBER2017

Philipp Krähenbühl, Vladlen Koltun. Parameter Learning and Convergent Inference for Dense Random Fields. ICML, 2013.

Krähenbühl and Koltun built on their earlier work [] (find my reading notes here) where they introduced an efficient inference algorithm for dense random fields (i.e. fully connected CRFs) based on mean field inference. In particular, the model

$E(x|I,\theta) = \sum_i \psi_i(x_i|\theta) + \sum_{i < j} \psi_{ij}(x_i,x_j|\theta)$

with Gaussian mixture pairwise models

$\psi_{ij}(x_i,x_j|\theta) = \sum_{m = 1}^C \mu^{(m)}(x_i,x_j|\theta)k^{(m)}(f_i - f_j)$

and learned unary potentials. Here $x_i$ refers to the label corresponding to pixel $i$ while $f_i$ being to the corresponding feature vector (e.g. color and coordinates). The label compatibility function can be defined or learned. The mean field approximation minimizes (see []) the KL divergence

$D(Q|P) = \sum_{i,x_i} Q_i(x_i)\log Q_i(x_i)$

$+ \sum_{i,x_i}\psi_i(x_i|\theta)Q_i(x_i) + \sum_{i < j}\sum_{x_i,x_j} \psi_{ij}(x_i,x_j|\theta)Q_i(x_i)Q_j(x_j) + \log Z(I,\theta)$

based on approximating the distribution $P(x)=\frac{1}{Z(I,θ)} \exp⁡(−E(x│I,\theta))$ with $Q(x)= \prod_i Q_i(x_i)$ to derive the basic message passing algorithm given by

$Q_i(x_i) = \frac{1}{Z_i} \exp\left(\psi_i(x_i|\theta) - \sum_{j \neq i}\sum_{x_j} \psi_{ij}(x_i,x_j|\theta)Q_j(x_j)\right)$.

As the sum in the above Equation requires $\mathcal{O}(N^2)$ operations (with $N$ being the number of pixels), Krähenbühl and Koltun introduce a simple convolution-based scheme to approximately compute the messages, running in linear time.

In this paper, Krähenbühl and Koltun further build on this scheme by presenting a variant that guarantees convergence. For this to work, they make the following two assumptions:

1. The used kernels $k^{(m)}$ are positive definite (i.e. have positive definite kernel matrix);
2. And the used label compatibility functions $\mu^{(m)}$ are negative semi-definite (i.e. a $c$ exists such that the matrix with entries $\mu_ij^{(m)}=\mu^{(m)}(l,l') + c$ is negative semi-definite).

Under these assumptions, the objective of the KL divergence is rewritten in terms of the vector $q_{i + l} = Q_i(x_i = l) \in \mathbb{R}^{N\cdot M}$ where $M$ is the number of labels and $u_{i + l} = \psi_i(x_i = l)$:

$D(Q|P) = q^T \log q + q^T u + \frac{1}{2}q^T \Psi q + \log Z(I,\theta)$(1)

where the matrix $\Psi$ is defined as follows:

$\Psi = \sum_{m = 1}^C (K^{(m)} - I) \otimes \mu^{(m)}$(2)

Here, $K^{(m)}$ is the kernel matrix, $I$ is the identity matrix, $\mu^{(m)}$ is the label compatibility matrix mentioned previously. $\otimes$ refers to the Kronecker product. Note that $\Psi$ is of size $NM \times NM$ such that the dimensions in Equation (1) match! The idea is then to decompose Equation (1) into a sum of a convex function $f(q)$ and a concave function $g(q)$. Substituting Equation (2) into Equation (1) and splitting the subtraction in the sum in Equation (2) results in the following decomposition:

$f(q) = q^T \log q - \frac{1}{2} q^T \left(\sum_{m = 1}^C I \otimes \mu^{(m)}\right)q$

$g(q) = q^T U + \frac{1}{2}q^T\left(\sum_{m = 1}^C K^{(m)} \otimes \mu^{(m)}\right)q + \log Z(I,\theta)$

The argumentation is that $K^{(m)} \otimes \mu^{(m)}$ is negative semi-definite due to both assumptions and $g(q)$, thus, concave. Conversely, $−I \otimes \mu^{(m)}$ is positive definite and $f(q)$, thus, convex. They apply the convex-concave procedure (CCCP) [] to minimize the KL divergence.

Finally, Krähenbühl and Koltun also provide an algorithm for learning all the involved parameters. Unfortunately, the derivation is more than hard to follow such that I was only able to comprehend the derivation with some extra sheets of empty paper. For details see the paper.

• [] Philipp Krähenbühl, Vladlen Koltun. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials. NIPS, 2011.
• [] Alan L. Yuille, Anand Rangarajan. The Concave-Convex Procedure (CCCP). NIPS, 2001.