Garcia-Garcia et al. present several experiments conducted using VoxNet  on density occupancy grids for 3D shape classification. The used network architecture is shown in Figure 1 – they introduce a so-called PC Data Layer which converts a point cloud into a density grid, i.e. a voxel grid where each voxel represents the number of points within the voxel. On the ModelNet dataset , they convert the given meshes into point clouds using ray tracing. An example of their representation is shown in Figure 2. Except for the change in representation, they do not introduce changes to the architecture.
Figure 1 (click to enlarge): Used architecture, where the PC Data Layer is described in the text. The parameters, i.e. $300$ and $5$ correspond to a voxel grid of $60^3$, i.e. $300^3$ units with voxel size $5^3$.
Figure 2 (click to enlarge): Illustration of the used representation. From left to right: original mesh, point cloud obtained by ray tracing and density voxel grid fed into the 3D convolutional neural network.
Interestingly, they report that deeper networks are not able to increase performance. They attribute this fact to severe over-fitting and the highly unbalanced ModelNet10. Unfortunately, they only experiment with two different architectures, where the second one adds only one additional convolutional layer to the architecture depicted in Figure 1. Additionally, they do not present any experiments to overcome these difficulties.
What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: