I am looking for full-time (applied) research opportunities in industry, involving (trustworthy and robust) machine learning or (3D) computer vision, starting early 2022. Check out my CV and get in touch on LinkedIn!


Vishakh Hegde, Reza Zadeh. FusionNet: 3D Object Classification Using Multiple Data Representations. CoRR, 2016.

Hegde and Zadeh discuss the fusion of multi-view convolutional neural networks (CNNs) and volumetric/3D CNNs for shape classification on ModelNet []. They combine a multi-view CNN similar to [] but based on AlexNet with two volumetric CNNs – the architectures are shown in Figure 1 and Figure 2 respectively. Both architectures are quite simple and small, adding only few parameters to the multi-view CNN. Interestingly, the used convolutional kernels have size $3 \times 3 \times 30$ for volumes of size $30^3$. This way, they hope to learn long-range correlation of the voxels assuming that the models are trained on all possible orientations of the shapes.


Figure 1 (click to enlarge): Illustration of the architecture of their “first” volumetric CNN.

Figure 2 (click to enlarge): The network architecture of their “second” volumetric CNN. The architecture lends ideas from the Inception modules discussed for GoogLeNet [].


Experimental results show that, used alone, the multi-view CNN is still superior to the volumetric CNNs. But on the other hand, these are trained and evaluated on a resolution of $30^3$ only. When combining two volumetric CNNs with their multi-view CNN they are able to outperform the state-of-the-art on ModelNet. They combine the models using a linear combination of the class scores where the weights are determined using cross-validation.

  • [] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 3d shapenets: A deep representation for volumetric shapes. CVPR, 2015.
  • [] H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller. Multiview convolutional neural networks for 3d shape recognition. CVPR, 2015.
  • [] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. CoRR, 2014.

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: