Efficient High-Quality Superpixels: SEEDS Revised

Overview

SEEDS Revised is an implementation of the superpixel algorithm SEEDS proposed in [1] and used for evaluation purposes in [2]. Based on color histograms, an initial superpixel segmentation is iteratively refined. The algorithm is able to generate high-quality superpixel segmentations in near realtime. The implementation is based on OpenCV (compatible with both OpenCV 2 and OpenCV 3), provides thorough documentation and a clean API. Further, a one-parameter variant requiring only the desired number of superpixels is provided:

cv::Mat image = cv::imread(filepath);
int superpixels = 800;
int iterations = 2;

SEEDSRevisedMeanPixels seeds(image, superpixels);
seeds.initialize();
seeds.iterate(iterations);

// Retrieve labels and put out contour image.
int** labels = seeds.getLabels();

SEEDS Revised on GitHub

The bachelor thesis [2] is available online: Superpixels / SEEDS

You may also go directly to Usage and Parameters instead of learning more about the actual algorithm.

Algorithm

Given an image $I: \{1, \ldots, W\} \times \{1, \ldots, H\} \rightarrow \{0, \ldots, 255\}^3$, the algorithm starts by grouping pixels into blocks of size $w^{(1)} \times h^{(1)}$. These blocks are then further arranged into blocks of $2 \times 2$. This scheme is applied recursively to form a hierarchy of blocks such that blocks at level $l$ have size

$w^{(l)} = w^{(1)} \cdot 2^{l-1}$ and $h^{(l)} = h^{(1)} \cdot 2^{l-1}$.

This is repeated for a total of $L$ levels. Therefore, the number of superpixels is calculated as follows:

$S = \frac{W}{w^{(L)}} \cdot \frac{H}{h^{(L)}}$

where we assume the divisions to be integer divisions. In practice, this means that the number of levels $L$ and the initial block size $w^{(1)} \times h^{(1)}$ need to be dervied from the desired number of superpixels $K$.

The initial superpixel segmentation is refined by exchanging blocks of pixels as well as single pixels between neighboring superpixels. Therefore, let $B_i^{(l)} \subseteq \{1,\ldots, W\}\times\{1,\ldots, H\}$ be a block of pixels at level $l$ and $S_j$ be a superpixel. With $h_{B_i^{(l)}}$ we denote the color histogram of block $B_i^{(l)}$. Then, the similarity of block $B_i^{(l)}$ and superpixel $S_j$ is given by the intersection of the corresponding histograms:

$\cap(h_{B_i^{(l)}}, h_{S_j}) = \sum_{q = 1}^Q \min\{h_{B_i^{(l)}}(q), h_{S_j}(q)\}$

where we assume the histograms to be normalized. Furthermore, the similarity of a single pixel $x_n$ and the superpixel $S_j$ is given by

$h_{S_j}(h(x_n))$

where $h(x_n)$ is the histogram bin of pixel $x_n$. Given this basic framework, algorithm 1 describes a simple implementation of SEEDS.

function SEEDS(
        $I$, // Color image.
        $w^{(1)} \times h^{(1)}$, // Initial block size.
        $L$, // Number of levels.
        $Q$ // Histogram size.
    )
    initialize the block hierarchy and the initial superpixel segmentation
    // Initialize histograms for all blocks and superpixels:
    for $l = 1$ to $L$
        // At level $l = L$ these are the initial superpixels:
        for each block $B_i^{(l)}$ 
            initialize histogram $h_{B_i^{(l)}}$
    // Block updates:
    for $l = L - 1$ to $1$
        for each block $B_i^{(l)}$
            let $S_j$ be the superpixel $B_i^{(l)}$ belongs to
            if a neighboring block belongs to a different superpixel $S_k$
                if $\cap(h_{B_i^{(l)}}, h_{S_k}) > \cap(h_{B_i^{(l)}}, h_{S_j})$
                    $S_k := S_k \cup B_i^{(l)}$, $S_j := S_j - B_i^{(l)}$
    // Pixel updates:
    for $n = 1$ to $W\cdot H$
        let $S_j$ be the superpixel $x_n$ belongs to
        if a neighboring pixel belongs to a different superpixel $S_k$
            if $h_{S_k}(h(x_n)) > h_{S_j}(h(x_n))$
                $S_k := S_k \cup \{x_n\}$, $S_j := S_j - \{x_n\}$
    return $S$

Algorithm 1: The basic algorithm of SEEDS. In practice, both block updates as well as pixel updates can be iterated more than once.

As alternative, the similarity of a pixel $x_n$ and a superpixel $S_j$ can be expressed as distance:

$d(x_n, S_j) = \|I(x_n) - I(S_j)\|_2$

where $I(S_j)$ is the mean color of the superpixel $S_j$. This variant of pixel updates is called mean pixel updates [1].

As smooth and compact superpixels are preferrable, van den Bergh et al. [1] introduced a smoothing term: For each pair of pixels $(x_n, x_m)$ (assuming that these pixels belong to different superpixels $S_j$ and $S_k$, respectively), we consider the $3 \times 4$ or $4\times 3$ neighborhood and count the number $N_{n,m,j}$ of pixels belonging to superpixel $S_j$ as well as the number $N_{n,m,k}$ of pixels belonging to superpixel $S_k$. Then, the criterion for pixel updates changes to

$N_{n,m,k}\cdot h_{S_k}(h(x_n)) > N_{n,m,j}\cdot h_{S_j}(h(x_n))$.

This way, we encourage individual pixels to belong to the same superpixel as its neighboring pixels and, thus, enforce smoothness. Of course, this approach can also be applied to mean pixel updates, however, mean pixel updates can be extendend to include an explicit compactness term by using the distance

$d(x_n, S_j) = \|I(x_n) - I(S_j)\|_2 + \beta \|x_n - \mu(S_j)\|_2$, $x_n \in \{1, \ldots, W\} \times \{1, \ldots, H\}$,(∗)

where $\mu(S_j)$ is the mean position of the superpixel $S_j$. The procedure given in algorithm 1 can easily be extended to include the above considerations, see [2].

Complexity

SEEDS runs linear in the total number $N = W\cdot H$ of pixels: SEEDS $\in \mathcal{O}(QTN)$ where $T$ is the number of iterations of block and pixel udpates and $Q$ the histogram size. This can easily be derived when considering to pre-compute the block and superpixel histograms - the histograms can then be updated whenever a block or pixel changes its label.

While the above statement is true, in practice, the runtime of SEEDS does also depend on the number of levels $L$ as well as the block size $w^{(1)} \times h^{(1)}$. Therefore, the runtime only depends indirectly on the number of superpixels $K$ - in particular, the runtime does not scale with $K$.

Usage and Parameters

Compiling

The implementation is based on CMake; e.g. on Ubuntu, CMake can be installed using

$ sudo apt-get install cmake

In addition, OpenCV is required (compatible with both OpenCV 2 and OpenCV 3). See the website for installation details. Then, the implementation can be compiled using

$ git clone https://github.com/davidstutz/seeds-revised.git
$ cd seeds-revised/build
$ cmake ..
$ make
$ ../bin/cli --help
Allowed options:
    --help                       produce help message
    --input arg                  the folder to process, may contain several 
                              images
    --bins arg (=5)              number of bins used for color histograms
    --neighborhood arg (=1)      neighborhood size used for smoothing prior
    --confidence arg (=0.11)     minimum confidence used for block update
    --iterations arg (=2)        iterations at each level
    --spatial-weight arg (=0.25) spatial weight
    --superpixels arg (=400)     desired number of supüerpixels
    --verbose                    show additional information while processing
    --csv                        save segmentation as CSV file
    --contour                    save contour image of segmentation
    --labels                     save label image of segmentation
    --mean                       save mean colored image of segmentation
    --output arg (=output)       specify the output directory (default is 
                              ./output)

Usage

The implementation provides two classes: SEEDSRevised and SEEDSRevisedMeanPixels where the latter implements the so called mean pixel updates (see above or [2]). The following example shows how to use the SEEDSRevisedMeanPixels class to oversegment a given image:

#include &lt;opencv2/opencv.hpp&gt;
#include "SeedsRevised.h"
#include "Tools.h"

// ...

cv::Mat image = cv::imread(filepath);

// Number of desired superpixels.
int superpixels = 800;

// Number of bins for color histograms (per channel).
int numberOfBins = 5;

// Size of neighborhood used for smoothing term, see [1] or [2].
// 1 will be sufficient, >1 will slow down the algorithm.
int neighborhoodSize = 1;

// Minimum confidence, that is minimum difference of histogram
// intersection needed for block updates: 0.1 is the default value.
float minimumConfidene = 0.1;

// The weighting of spatial smoothing for mean pixel updates - the
// euclidean distance between pixel coordinates and mean superpixel
// coordinates is used and weighted according to:
//  (1 - spatialWeight)*colorDistance + spatialWeight*spatialDistance
// The higher spatialWeight, the more compact superpixels are generated.
float spatialWeight = 0.25;

// Number of iterations at each level.
int iterations = 2;

// Instantiate a new object for the given image.
SEEDSRevisedMeanPixels seeds(image, superpixels, numberOfBins, neighborhoodSize, minimumConfidence, spatialWeight);

// Initializes histograms and labels.
seeds.initialize();
// Run block and pixel updates.
seeds.iterate(iterations);

// Save a contour image to the following location:
std::string storeContours = "./contours.png";

// BGR color for contours:
int bgr[] = {0, 0, 204};

// seeds.getLabels() returns a two-dimensional array containing the
// computed superpixel labels.
cv::Mat contourImage = Draw::contourImage(seeds.getLabels(), image, bgr);
cv::imwrite(store, contourImage);

Some details on the parameters:

image: the image in the form of a cv::Mat in BGR color space.
superpixels: number of desired superpixels.
numberOfBins: number of bins per channel used to calculate the color histograms (see above).
neighborhoodSize: size of the smoothing window as explained above; $1$ is usually sufficient, greater values will slow down the algorithm and result in smoother superpixels.
minimumConfidence: minimum difference of histogram intersection required for block updates.
spatialWeight: only available for SEEDSRevisedMeanPixels; used to weight the compactness term in (∗).

Qualitative and Quantitative Results

Figure 1 shows superpixel segmentations acquired on images from the Berkeley Segmentation Dataset [3].

Figure 1 (click to enlarge): Superpixel segmentations of images taken from the Berkeley Segmentation Dataset generated by: first row - original implementation of SEEDS; second row - original implementation of SEEDS with mean pixel updates; third row - SEEDS Revised; fourth row - SEEDS Revised with mean pixel updates; fifth row - SEEDS Revised with mean pixel updates and compactness term (∗).

Both SEEDS Revised, referred to as reSEEDS, and the original implementation, referred to as oriSEEDS, have been evaluated on training sets of the Berkeley Segmentation Dataset (blue) and the NYU Depth V2 Dataset (red) [4] using an extended version of the Berkeley Segmentation Benchmark [3] which is available on GitHub. Both implementations were evaluated with mean pixel updates and the neighborhood smoothing prior; reSEEDS* additionally uses the compactness term in equation (∗). Figure 2 compares both implementations and shows the influence of the additional compactness term as well as the histogram size. For additional comparison to other state-of-the-art approaches see Superpixels / SEEDS.

Figure 2 (click to enlarge): Evaluation results on the training sets of the Berkeley Segmentation Dataset (blue) and the NYU Depth Dataset V2 (red). $Rec$ denotes Boundary Recall, $UE$ denotes Undersegmentation Error, see here. Furthermore, $Q$ is the histogram size per channel, $\beta$ is the weighting term used in equation (∗).

Runtime

Table 1 compares the runtime of both implementations on the Berkeley Segmentation Dataset and the Nyu Depth V2 Dataset, see the caption for details.

Berkeley Segmentation Dataset, $N = 481 \times 321 = 154401$
$K$	oriSEEDS	reSEEDS	reSEEDS*
200	0.14045	0.06745	0.06955
400	0.23605	0.112	0.1156
600	0.3354	0.14085	0.14665
800	0.17285	0.08345	0.08505
1040	0.21295	0.09565	0.09985
NYU Depth V2 Dataset, $N = 608 \times 448 = 272384$
$K$	oriSEEDS	reSEEDS	reSEEDS*
450	0.30215	0.1386	0.14455
700	0.41365	0.18035	0.18795
840	0.21645	0.1085	0.11398
1100	0.2598	0.12515	0.13165
1400	0.3068	0.141	0.14743

Table 1: Runtime in seconds of both implementations for the Berkeley Segmentation Dataset and the NYU Depth V2 Dataset. The following parameters were used: $T = 2$ iterations of block updates at each level; $2T = 4$ iterations of pixel updates; $Q = 7^3$.

The runtime of both implementations can be reduced by reducing the number of iterations $T$ as well as the histogram size $Q$. In addition, the neighborhood smoothing prior can be dropped - in particular, as reSEEDS* provides the additional compactness term of equation (∗), the neighborhood smoothing prior can be dropped without reducing compactness or smoothness of superpixels. Table 2 shows the runtime with $T = 1$ and $Q = 3^3$ without using the neighborhood smoothing prior.

Berkeley Segmentation Dataset, $N = 481 \times 321 = 154401$
$K$	oriSEEDS	reSEEDS	reSEEDS*
200	0.05845	0.0292	0.02495
400	0.06345	0.03525	0.03545
600	0.0734	0.0401	0.05625
800	0.05415	0.04245	0.03305
1040	0.0625	0.0494	0.03465
NYU Depth V2 Dataset, $N = 608 \times 448 = 272384$
$K$	oriSEEDS	reSEEDS	reSEEDS*
450	0.10583	0.0712	0.058125
700	0.11395	0.09635	0.0563
840	0.1128	0.073075	0.049125
1100	0.11728	0.051375	0.067825
1400	0.13737	0.05395	0.06495

Table 1: Runtime in seconds of both implementations for the Berkeley Segmentation Dataset and the NYU Depth V2 Dataset. The following parameters were used: $T = 1$ iterations of block updates at each level; $2T = 2$ iterations of pixel updates; $Q = 3^3$.

References

[1] M. van den Bergh, X. Boix, G. Roig, B. de Capitani, L. van Gool. SEEDS: Superpixels extracted via energy-driven sampling. European Conference on Computer Vision, pages 13–26, 2012.
[2] D. Stutz. Superpixel Segmentation using Depth Information. Bachelor thesis, RWTH Aachen University, Aachen, Germany, 2014.
[3] P. Arbeláez, M. Maire, C. Fowlkes, J. Malik. Contour detection and hierarchical image segmentation. Transactions on Pattern Analysis and Machine Intelligence, 33(5):898–916, 2011.
[4] N. Silberman, D. Hoiem, P. Kohli, R. Fergus. Indoor segmentation and support inference from RGBD images. European Conference on Computer Vision, pages 746–760, 2012.

IAM

DAVIDSTUTZ

ARTICLE