J. Huang, S. You. Point cloud labeling using 3d convolutional neural network. ICPR, 2016.

Huang and You use simple 3D convolutional networks for point cloud labeling. Given a big point cloud, e.g. consisting of a part of Ottawa, they extract individual point clouds by moving a center point through the point cloud and extracting a cubic bounding box with defined radius. The extracted point cloud is transformed to a voxelized occupancy grid used as input. The labels are inferred using a voting scheme for each voxel (as multiple labels can be present in each voxel). They claim to use $8000$ cells as input, which would correspond to $20 \times 20 \times 20$. This is, indeed, rather small, as they claim that 3D convolutional networks quickly reach the memory limit.

The used network is rather simple and supposed to perform per-pixel semantic segmentation. Motivated by LeNet [1], the network consists of two 3D convolutional layers (where the convolutional layer is extended to 3D in a straight-forward way) and two 3D pooling layers, followed by a fully connected layer. This is illustrated in Figure 1. They present qualitative results in Figure 2.

Figure 1 (click to enlarge): Illustration of the used network architecture.

Figure 2: Qualitative results.

  • [1] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.