Vishakh Hegde, Reza Zadeh. FusionNet: 3D Object Classification Using Multiple Data Representations. CoRR, 2016.

Hegde and Zadeh discuss the fusion of multi-view convolutional neural networks (CNNs) and volumetric/3D CNNs for shape classification on ModelNet [27]. They combine a multi-view CNN similar to [24] but based on AlexNet with two volumetric CNNs – the architectures are shown in Figure 1 and Figure 2 respectively. Both architectures are quite simple and small, adding only few parameters to the multi-view CNN. Interestingly, the used convolutional kernels have size $3 \times 3 \times 30$ for volumes of size $30^3$. This way, they hope to learn long-range correlation of the voxels assuming that the models are trained on all possible orientations of the shapes.

Figure 1: Illustration of the architecture of their “first” volumetric CNN.

Figure 2: The network architecture of their “second” volumetric CNN. The architecture lends ideas from the Inception modules discussed for GoogLeNet [25].

Experimental results show that, used alone, the multi-view CNN is still superior to the volumetric CNNs. But on the other hand, these are trained and evaluated on a resolution of $30^3$ only. When combining two volumetric CNNs with their multi-view CNN they are able to outperform the state-of-the-art on ModelNet. They combine the models using a linear combination of the class scores where the weights are determined using cross-validation.

  • [24] H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE International Conference on Computer Vision, pages 945–953, 2015.
  • [25] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. CoRR, abs/1409.4842, 2014.
  • [27] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1912–1920, 2015.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.