P. Márquez-Neila, P. Kohli, C. Rother, L. Baumela. Non-parametric Higher-Order Random Fields for Image Segmentation. ECCV, 2014.

Márquez-Neila et al. propose a non-parametric higher-order random field for image segmentation with applications on electron microscopy images [1] as well as classical images [2]. The proposed model has the form

$E(y) = \sum_{c \in \mathcal{P}} \phi(y_c |x_c) = \sum_{c \in \mathcal{P}} \phi_c(y_c)$

where $\mathcal{P}$ refers to a set of pixel cliques, $x$ is the observed image and $y$ denote the labels to be inferred. In theory, the cliques can have different shapes and sizes, but Márquez-Neila et al. concentrate on square patches of fixed size samples on a regular grid with fixed stride to represent the cliques. The cliques are, however, overlapping. The clique potential

$\phi_c(y_c) = \min_{q \in \{1,\ldots,t\}} \theta_q^c + d_q^c(y_c)$

combines two costs. First, $\theta_q^c$, the cost of clique $c$ matching pattern $q$ in $Y= \{y^1,\ldots,y^t\}$ being a set of patterns extracted from the training set. In particular, $\theta_q^c$ is computed as the log-posterior of the pattern $y^q$ coming from the observed pixels $x_c$:

$\theta_q^c = - \log\left(P(y^q | x_c)\right) = - \sum_i P(y_i^q | x_c)$.

The individual pixel-wise posteriors are learned using a classifier. Second, the cost $d_q^c(y_c)$ is computed by counting the number of pixels in the patch not matching the selected pattern $y^q$ and multiplying the count by the hyper parameter $\alpha$.

In order to minimize the energy $E(y)$ to obtain the labeling $y$, the higher-order potentials are transformed to pairwise potentials similar to [3,4]. To this end, a pattern selection variable $z_c$ is introduced for each patch (i.e. clique). This allows to transform the clique cost:

$\\phi_c(y_c) = \min_{z_c} \theta_{z_c}^c + \alpha \sum_{i \in c} \delta(y_i \neq y^{z_c}_i)$

where $y_i^q$ is the label corresponding to pixel $i$ in the pattern $y^q$. As can be seen, the clique cost was transformed into a sum of a unary and a pairwise term (note that in the pairwise term, the involved variables are $z_c$ and $y_i$ for each pixel $i$ in the clique $c$). The transformation is also illustrated in Figure 1.


Figure 1 (click to enlarge): Factor graph of a simple higher-order random field before (a) and after (b) the transformation described.

Training merely involves extracting the patterns $Y$ from a training set. At testing time, the dissimilarities, i.e. the costs, are computed after the patches/cliques have been defined. In practice, only the "closest" patterns from the set $Y$ are taken into considerations to avoid high runtimes. Inference is performed after the transformation into the pairwise form using tree-reweighted message-passing and belief propagation. For details regarding the set $Y$ and computing the costs see the paper.

  • [1] A. Lucchi, Y. Li, P. Fua. Learning for Structured Prediction Using Approximate Subgradient Descent with Working Sets. CVPR, 2013.
  • [2] J. Shotton, J. Winn, C. Rother, A. Criminisi. TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. ECCV, 2006.
  • [3] P. Kohli, M. P. Kumar. Energy minimization for linear envelope MRFs. CVPR, 2010.
  • [4] C. Rother, P. Kohli, W. Feng, J. Jia. Minimizing sparse higher order energy functions of discrete variables. CVPR, 2009.

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: