C. Xu, J. J. Corso. Evaluation of Super-voxel Methods for Early Video Processing. Conference on Computer Vision and Pattern Recognition, 2012.

Xu and Corso implement and evaluate several supervoxel algorithms including the approach proposed by Grundmann et al. [1]. Evaluation is based on 3D variants of Undersegmentation Error, Achievable Segmentation Accuracy and Boundary Recall. Both the benchmark as well as the implementations are published as part of a library called libsvx which is available at the project's webpage.

Note that the used formulation of the Undersegmentation Error of a supervoxel segmentation $S = \{S_i\}$, $S_i \subseteq \{1,\ldots,H\} \times \{1,\ldots,W\} \times \{1,\ldots, T\} =: V$, with respect to a ground truth segmentation $G = \{G_j\}$, $G_j \subseteq V$, defined as

$UE(S,G) = \frac{1}{|G|} \sum_{G_j \in G} \frac{\left(\sum_{S_i \cap G_j \neq \emptyset} |S_i|\right) - |G_j|}{|G_j|}$

is not constrained to lie in $[0,1]$. Therefore, the results are hard to interpret or compare across datasets - or even across different video sequences. A generalization of the formulation given by Neubert and Protzel [2] to video sequences seems more appropriate.

  • [1] M. Grundmann, V. Kwatra, M. Han, I. Essa. Efficient Hierarchical Graph Based Video Segmentation. Conference on Computer Vision and Pattern Recognition, 2010.
  • [2] P. Neubert, P. Protzel. Superpixel benchmark and comparison. Forum Bildverarbeitung, 2012.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.