Wu et al. propose an extension to the VAE-GAN model  to 3D data in order to tackle 3D shape generation and classification. In the VAE-GAN model a variational autoencoder is combined with a generative adversarial network as illsutrated in Figure 1. For details, see .
Figure 1 (click to enlarge): Illustration of the VAE-GAN model proposed by Larsen et al.  and generalized to 3D data by Wu et al. The variational auto-encoder consisting of encoder and decoder represents the connection between the input $x$ and the latent variables (or the code) $z$. The adversarial generative network combines a generator and a discriminator. The generator tries to generate data and fool the discriminator into believing that the generated data is real. In the VAE-GAN model, the decoder and the generator represent the same model, i.e. share their parameters. Training is performed by minimizing a combination of the losses of both models.
The network architecture used for the generator is illustrated in Figure 2. The discriminator mirrors this structure. The encoder is a convolutional neural network operating on images (not 3D data) consisting of 5 convolutional layers followed by batch normalization and ReLU activation layers. The idea is that the encoder allows to get the latent variables $z$ from a 2D image and then perform 3D reconstruction using the decoder/generator, taking $z$ as input, to generated a 3D shape.
Results of data generation are shown in Figure 3. For visualization, z is sampled form a uniform distribution and the largest connected component is visualized. 3D reconstruction results are demonstrated in Figure 4.