IAM

OCTOBER2016

READING

P. Arbelaez. Boundary Extraction in Natural Images Using Ultrametric Contour Maps. Conference on Computer Vision and Pattern Recognition, 2006.

Arbeláez introduces the notion of ultrametric contour maps (also known from later work on segmentation and contour detection [1][2] and used in the Berkeley Segmentation Benchmark). For deriving the process of ultrametric contour maps, he considers a hierarchical segmentation $S = \{S_j\}_{j = 0}^L$ where $S_0$ is the initial oversegmentation and $S_L$ the highest level segmentation. For each $j$, $S_j$ is assumed to be a partition of the domain $\Omega$ of the image. The hierarchical segmentation is assumed to fulfill

$S_L = \{\Omega\}$,

$\forall 0 \leq i \leq L, 0 \leq j \leq L, i < j \quad\Rightarrow\quad S_i \sqsubseteq S_j$.

When allowing the index $j$ to take real values, i.e. $j \in \mathbb{R}$, it is called scale parameter and the above conditions are replaced by (note that $i,j \in \mathbb{R}$):

$\forall i \leq 0\quad S_i = S_0$,(1)

$\forall i \geq L\quad S_i = \{\Omega\}$,(2)

$\forall i \leq j\quad\Rightarrow\quad S_i \sqsubseteq S_j$.(3)

Then, the stratification index $f(R)$ of a set $R \in S_j$ for some $j \in \mathbb{R}$ is defined as

$f(R) = \inf\{j \in [0, L] | R \in S_j\}$.(4)

Instead of considering the individual segmentations $S_j$, Arbeláez considered the induced contours, each contour separating two regions. Reformulating conditions (1) - (3) in terms of the contours $K = \{K_j\}_{j = 0}^L$ gives

$\forall i \leq 0\quad K_i = K_0$,

$\forall i \geq L\quad K_i = \partial \Omega$,

$\forall i \leq j\quad K_j \subseteq K_i$.

An ulrametric contour map assigns each contour $\partial \subseteq \Omega$ between two regions a saliency value:

$s(\partial) = \inf\{j \in [0, L]| \partial \not\subseteq K_j\}$.(5)

A simple example illustrating the concept of an ultrametric contour map provided by Arbeláez is shown in Figure 1.

ucm_arbelaez

Figure 1: From left to right: $K_0$, $K_1$, $K_2$ (where black pixels indicate contour pixels), ultrametric contour map (white resembles higher value), ultrametric contour map in 3D,

The name ultrametric contour map comes from the duality of Equations (4) and (5) and the fact that the distance

$d(x, y) = \inf\{f(R) | x \in R, y \in R, R \in S_j \text{ for some }j\}$

is ultrametric.

The goal of Arbeláez is to design appropriate ultrametrics for hierarchical segmentation such that the generated boundaries resemble the object boundaries in the image. The induced distance/dissimilarity between regions can then be used in any region merging algorithm to generate hierarchical segmentations.

Arbeláez considers three ultrametrics:

  • Contrast ultrametric: The initial segmentation $S_0$ is used to define a distance between colors. The local contrast at a point $x \in \Omega$ is defined as

    $\tau(x) = \sup\{\|y - z\|_2 |\forall y, z \in B_r(x)\}$

    where $B_r(x)$ is a local neighborhood around $x$ and the Euclidean distance is applied in Lab color space. The contrast ultrametric is then defined as

    $d_c(R_i, R_j) = \frac{\sum_{x \in \partial_{ij}} \tau(x)}{\sum_{x \in \partial_{ij}} 1}$

    where $\partial_{ij}$ denotes the boundary between the two regions. Note that here we assume $\Omega$ to be a set of discrete pixel locations. The continuous formulation can be found in the paper.
  • Boundary ultrametric: The boundary ultrametric takes the result of a local edge detector into account. Given an edge map $g$, the boundary ultrametric is given as

    $d_b(R_i, R_j) = d_c(R_i, R_j) + \alpha \frac{\sum_{x \in \partial{ij}} g(x)}{\sum_{x \in \partial_{ij}} 1}$

    with $\alpha$ being a weighting term.
  • Boundary ultrametric: The ultrametrics discussed above do not take into account the interior of the regions. Thus, the boundary ultrametric multiplies $d_b$ by

    $min(A(R_i), A(R_j))^\beta$

    with $\beta \geq 0$ and

    $A(R) = \left(\sum_{x \in R} 1 \right)\left(\sum_{x \in R} \|x - \mu(R)\|_2^2\right)$

    where $\mu(R)$ is the mean color of region $R$.

Unfortunately, Arbeláez does not describe the generation of the initial oversegmentation $S_0$ in detail:

"..., we produce an oversegmentation by considering the extrema of the L channel in the original image and associating each point of $\Omega$ to the closest extremum in the sense of $d_R$."

  • [1] P. Arbeláez J. Pont-Tuset, J. T. Barron, F. Marqués, J. Malik. Multiscale Combinatorial Grouping.Conference on Computer Vision and Pattern Recognition, 2014.
  • [2] P. Arbeláez, M. Maire, C. Fowlkes and J. Malik. Contour Detection and Hierarchical Image Segmentation. Pattern Analysis and Machine Intelligence, Vol. 33, No. 5, 2011.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.