IAM

OPENSOURCEFAN STUDYING
STUDYINGCOMPUTERSCIENCEANDMATH COMPUTERSCIENCE

Check out the latest superpixel benchmark — Superpixel Benchmark (2016) — and let me know your opinion! @david_stutz
16thJUNE2016

READING

L. Sevilla-Lara, D. Sun, V. Jampani, M. J. Black. Optical Flow with Semantic Segmentation and Localized Layers. Computing Research Repository, abs/1603.03911, 2016.

Sevilla-Lara et al. largely build upon the work by Sun et al. [1] by incorporating semantic segmentation into the proposed layered model for optical flow. Furthermore, the approach by Sun et al. is slightly altered to what is called "localized layers". Their approach is based upon the following two requirements:

  • Compute a semantic segmentation using the approach of [2] (i.e. the VGG-network [3] is transformed into a fully-convolutional model and refined to predict the necessary classes);
  • use [4] to compute an initial flow field.

Given the semantic segmentation, the initial flow field is refined depending on the class:

  • Planes (e.g. roads, sky and water) are modeled using planar motion by fitting a homography using RANSAC which is then defines the motion of each pixel belonging to the Plane-class;
  • Stuff (e.g. buildings, vegetarian and unkown) is modeled directly using the initial flow field;
  • Things (common foreground objects with bounded extend; e.g. cars, pedestrians and animals) are modeled using affine motion as in [1] (with the difference that the graphical model is not applied globally but to the patch containing the object at hand; note that objects are obtained after refining the semantic segmentation using a CRF and computing connected components).

At the time of publication, Sevilla-Lara et al. reported the leading result on the KITTI dataset [6], see the leaderboard, and qualitative results look promising on both the KITTI dataset and on selected Youtube sequences, see Figure 1.

sevilla_lara_kitti
sevilla_lara_youtube

Figure 1 (click to enlarge): Results of the proposed Semantic Optical Flow on the KITTI dataset [6] compared to Discrete Flow [4]. Furthermore, Sevilla-Lara et al. also present qualitative results on selected Youtube sequences.
  • [1] D. Sun, E. B. Sudderth, M. J. Black. Layered segmentation and optical flow estimation over time. Conference on Computer Vision and Pattern Recognition, 2013.
  • [2] K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition. Computing Research Repository, abs/1409.1556, 2014.
  • [3] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille. Semantic segmentation with deep convolutional nets and fully connected crfs. Computing Research Repository, abs/1412.7062, 2014.
  • [4] M. Menze, C. Heipke, A. Geiger. Discrete optimization for optical flow. German Conference on Pattern Recognition, 2015.
  • [6] A. Geiger, P. Lenz, C. Stiller, R. Urtasun. Vision meets robotics: the KITTI dataset. International Journal of Robotics Research, volume 32, number 11, 2013.

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below or using the following platforms: