Li et al. perform vehicle detection in 3D Lidar data using fully convolutional networks. To this end, they propose to run 2D fully connected convolutional networks on point maps, i.e. backprojections of the 3D Lidar data to the image plane. An example of a point map is shown in Figure 1.
The network architecture strongly mirrors the architectures used in  and  and is shown in Figure 2. The convolutional layers conv1 to conv3 are followed by pooling layers to subsample the feature maps. Note that conv1 subsamples the input by 4 pixels horizontally and 2 pixels vertically. This is because by construction of the point map is denser in the horizontal direction. The subsampled feature maps are then combined and upsampled to predict two outputs: an objectness score per point, and a 3D bounding box per point (bounding box map and objectness map in Figure 2).
Objectness is represented by a 2-unit Softmax output, trained using negative log-likelihood. The bounding boxes are represented by 8 corner points which are predicted simultaneously. For details on the corner point representation, see the paper. To balance positive and negative sample points (i.e. vehicle and non-vehicle points), negative points are re-weighted. Similarly, to avoid a bias towards close vehicles, positive samples closer to the Velodyne sensor are re-weighted, as well. Training is performed jointly minimizing the negative log-likelihood and the Euclidean distance of bounding boxes. During training, samples are transformed using random transformations in 3D close to the identity transformation. The approach is evaluated on the KITTI test set and shows promising results, among others compared to Vote3D .