Nima Sedaghat Alvar, Mohammadreza Zolfaghari, Thomas Brox. Orientation-boosted Voxel Nets for 3D Object Recognition. CoRR, abs/1604.03351, 2016.

Sedaghat et al. analyze the influence of orientation boosting, i.e. introducing orientation estimation as auxiliary task, for 3D shape and object recognition in VoxNets [1]. On several datasets including ModelNet [2] and KITTI [3], they introduce orientation estimation as auxiliary task in three different ways, as illustrated in Figure 1. The corresponding loss functions used are composites consisting of the individual cross-entropy loss functions.


Figure 1 (click to enlarge): The three different approaches proposed to integrate orientation estimation into the original classification task. First, the orientations are binned and the created classes are appended to the original classes. Second, the original and orientation classes are interpreted as additional layer and the original classes are stacked on top. Third, The orientation classes are understood as additional layer and the original classes are supposed to be inferred form the orientation labels.

In experiments, they show improved classification accuracy when incorporating orientation estimation. However, the results do not specifically favor one of the proposed approaches but rather are spread seemingly random across the three architectures. Based on the results, Sedaghat et al. also introduce a visualization technique they call dominant signal paths. These paths are computed by backtracking the most significant units contributing to the predicted class. Through this new visualization, they show that their architectures become sensitive to orientation, in contrast to the original VoxNet.

Unfortunately, the presented attempts to incorporate orientation estimation are quite simple — i.e. the architecture is left un-modified and the orientation classes are always predicted at the top of the architecture. Still, the visualization technique has the potential to guide architecture design to incorporate orientation.

  • [1] D. Maturana, S. Scherer. VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition. IROS, 2015.
  • [2] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, J. Xiao, J.. 3D ShapeNets: A Deep Representation for Volumetric Shapes. CVPR, 2015.
  • [3] A. Geiger, P. Lenz, R. Urtasun. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. CVPR, 2012.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.