Bo Li. 3D Fully Convolutional Network for Vehicle Detection in Point Cloud. CoRR, 2016.

Li discusses a simple 3D fully convolutional network for 3D object detections on KITTI []. The approach is considerably simple and the used architecture is summarized in Figure 1. A fully convolutional network naturally extends to the 3D domain, operating on an occupancy grid where each voxel is either occupied (equals $1$) or not. The output of the network is an objectness score per pixel (although the output is subsampled by $\frac{1}{4}^3$ compared to the input) and a bounding box per pixel. At testing time, the bounding boxes of positive object pixels are clustered to get a prediction (which corresponds to implicit non-maximum suppression). The bounding boxes are encoded as the corresponding corners in 3D. The model is evaluated on the KITTI dataset; without discussing the numbers, Figure 2 shows qualitative results.


Figure 1 (click to enlarge): Illustration of the simple network used for object detection. Unfortunately, details on the number of channels and filter sizes are missing.

Figure 2 (click to enlarge): Qualitative results demonstrating the objectness and bounding box predictions (left), the detections after clustering (middle) and the 3D bounding box predictions (right).

  • [] Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the KITTI vision benchmark suite. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 3354–3361, 2012.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.