Jiajun Wu, Chengkai Zhang, Tianfan Xue, William T. Freeman, Joshua B. Tenenbaum. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling. CoRR, 2016.

Wu et al. propose an extension to the VAE-GAN model [1] to 3D data in order to tackle 3D shape generation and classification. In the VAE-GAN model a variational autoencoder is combined with a generative adversarial network as illsutrated in Figure 1. For details, see [1].


Figure 1 (click to enlarge): Illustration of the VAE-GAN model proposed by Larsen et al. [1] and generalized to 3D data by Wu et al. The variational auto-encoder consisting of encoder and decoder represents the connection between the input $x$ and the latent variables (or the code) $z$. The adversarial generative network combines a generator and a discriminator. The generator tries to generate data and fool the discriminator into believing that the generated data is real. In the VAE-GAN model, the decoder and the generator represent the same model, i.e. share their parameters. Training is performed by minimizing a combination of the losses of both models.

The network architecture used for the generator is illustrated in Figure 2. The discriminator mirrors this structure. The encoder is a convolutional neural network operating on images (not 3D data) consisting of 5 convolutional layers followed by batch normalization and ReLU activation layers. The idea is that the encoder allows to get the latent variables $z$ from a 2D image and then perform 3D reconstruction using the decoder/generator, taking $z$ as input, to generated a 3D shape.


Figure 2 (click to enlarge: Illustration of the network architecture used for the generator. The discriminator mirrors this architecture.

Results of data generation are shown in Figure 3. For visualization, z is sampled form a uniform distribution and the largest connected component is visualized. 3D reconstruction results are demonstrated in Figure 4.


Figure 3 (click to enlarge): Generation results without a reference object.


Figure 3 (click to enlarge): 3D reconstruction results.

  • [1] Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Ole Winther. Autoencoding beyond pixels using a learned similarity metric. ICML, 2016.

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: