P. Márquez-Neila, P. Kohli, C. Rother, L. Baumela. Non-parametric Higher-Order Random Fields for Image Segmentation. ECCV, 2014.

Márquez-Neila et al. propose a non-parametric higher-order random field for image segmentation with applications on electron microscopy images [1] as well as classical images [2]. The proposed model has the form

$E(y) = \sum_{c \in \mathcal{P}} \phi(y_c |x_c) = \sum_{c \in \mathcal{P}} \phi_c(y_c)$

where $\mathcal{P}$ refers to a set of pixel cliques, $x$ is the observed image and $y$ denote the labels to be inferred. In theory, the cliques can have different shapes and sizes, but Márquez-Neila et al. concentrate on square patches of fixed size samples on a regular grid with fixed stride to represent the cliques. The cliques are, however, overlapping. The clique potential

$\phi_c(y_c) = \min_{q \in \{1,\ldots,t\}} \theta_q^c + d_q^c(y_c)$

combines two costs. First, $\theta_q^c$, the cost of clique $c$ matching pattern $q$ in $Y= \{y^1,\ldots,y^t\}$ being a set of patterns extracted from the training set. In particular, $\theta_q^c$ is computed as the log-posterior of the pattern $y^q$ coming from the observed pixels $x_c$:

$\theta_q^c = - \log\left(P(y^q | x_c)\right) = - \sum_i P(y_i^q | x_c)$.

The individual pixel-wise posteriors are learned using a classifier. Second, the cost $d_q^c(y_c)$ is computed by counting the number of pixels in the patch not matching the selected pattern $y^q$ and multiplying the count by the hyper parameter $\alpha$.

In order to minimize the energy $E(y)$ to obtain the labeling $y$, the higher-order potentials are transformed to pairwise potentials similar to [3,4]. To this end, a pattern selection variable $z_c$ is introduced for each patch (i.e. clique). This allows to transform the clique cost:

$\\phi_c(y_c) = \min_{z_c} \theta_{z_c}^c + \alpha \sum_{i \in c} \delta(y_i \neq y^{z_c}_i)$

where $y_i^q$ is the label corresponding to pixel $i$ in the pattern $y^q$. As can be seen, the clique cost was transformed into a sum of a unary and a pairwise term (note that in the pairwise term, the involved variables are $z_c$ and $y_i$ for each pixel $i$ in the clique $c$). The transformation is also illustrated in Figure 1.


Figure 1 (click to enlarge): Factor graph of a simple higher-order random field before (a) and after (b) the transformation described.

Training merely involves extracting the patterns $Y$ from a training set. At testing time, the dissimilarities, i.e. the costs, are computed after the patches/cliques have been defined. In practice, only the "closest" patterns from the set $Y$ are taken into considerations to avoid high runtimes. Inference is performed after the transformation into the pairwise form using tree-reweighted message-passing and belief propagation. For details regarding the set $Y$ and computing the costs see the paper.

  • [1] A. Lucchi, Y. Li, P. Fua. Learning for Structured Prediction Using Approximate Subgradient Descent with Working Sets. CVPR, 2013.
  • [2] J. Shotton, J. Winn, C. Rother, A. Criminisi. TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. ECCV, 2006.
  • [3] P. Kohli, M. P. Kumar. Energy minimization for linear envelope MRFs. CVPR, 2010.
  • [4] C. Rother, P. Kohli, W. Feng, J. Jia. Minimizing sparse higher order energy functions of discrete variables. CVPR, 2009.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.