Check out the latest superpixel benchmark — Superpixel Benchmark (2016) — and let me know your opinion! @david_stutz


Alberto Garcia-Garcia, Francisco Gomez-Donoso, José García Rodríguez, Sergio Orts-Escolano, Miguel Cazorla, Jorge Azorín López. PointNet: A 3D Convolutional Neural Network for real-time object class recognition. IJCNN, 2016.

Garcia-Garcia et al. present several experiments conducted using VoxNet [] on density occupancy grids for 3D shape classification. The used network architecture is shown in Figure 1 – they introduce a so-called PC Data Layer which converts a point cloud into a density grid, i.e. a voxel grid where each voxel represents the number of points within the voxel. On the ModelNet dataset [], they convert the given meshes into point clouds using ray tracing. An example of their representation is shown in Figure 2. Except for the change in representation, they do not introduce changes to the architecture.

Figure 1 (click to enlarge): Used architecture, where the PC Data Layer is described in the text. The parameters, i.e. $300$ and $5$ correspond to a voxel grid of $60^3$, i.e. $300^3$ units with voxel size $5^3$.


Figure 2 (click to enlarge): Illustration of the used representation. From left to right: original mesh, point cloud obtained by ray tracing and density voxel grid fed into the 3D convolutional neural network.


Interestingly, they report that deeper networks are not able to increase performance. They attribute this fact to severe over-fitting and the highly unbalanced ModelNet10. Unfortunately, they only experiment with two different architectures, where the second one adds only one additional convolutional layer to the architecture depicted in Figure 1. Additionally, they do not present any experiments to overcome these difficulties.

  • [] D. Maturana and S. Scherer. Voxnet: A 3d convolutional neural network for real-time object recognition. IROS, 2015.
  • [] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, 3d shapenets: A deep representation for volumetric shapes. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1912–1920.

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below or using the following platforms: