IAM

RESEARCH

Shape Completion

This page presents a novel learning-based and weakly-supervised approach to 3D shape completion of point clouds. Specifically, a shape prior enables to learn shape completion without access to ground truth shapes, as relevant in many scenarios including autonomous driving, 3D scene understanding or surface reconstruction. Below, a more detailed abstract can be found. Additionally, this page provides the full CVPR'18 paper as well as the source code and two new synthetic shape completion benchmarks.

Quick links: CVPR Paper | CVPR Poster (coming soon) | BibTex | Source Code (coming soon)

Abstract

3D shape completion from partial point clouds is a fundamental problem in computer vision and computer graphics. Recent approaches can be characterized as either data-driven or learning-based. Data-driven approaches rely on a shape model whose parameters are optimized to fit the observations. Learning-based approaches, in contrast, avoid the expensive optimization step and instead directly predict the complete shape from the incomplete observations using deep neural networks. However, full supervision is required which is often not available in practice. In this work, we propose a weakly-supervised learning-based approach to 3D shape completion which neither requires slow optimization nor direct supervision. While we also learn a shape prior on synthetic data, we amortize, ie, learn, maximum likelihood fitting using deep neural networks resulting in efficient shape completion without sacrificing accuracy. Tackling 3D shape completion of cars on ShapeNet and KITTI, we demonstrate that the proposed amortized maximum likelihood approach is able to compete with a fully supervised baseline and a state-of-the-art data-driven approach while being significantly faster. On ModelNet, we additionally show that the approach is able to generalize to other object categories as well.

Download & Citing

The CVPR paper can be downloaded below; the supplementary material with additional experimental results and details on the setup is provided separately:

Paper (∼ 2.7MB)Supplementary (∼ 4.9MB)Poster (coming soon)

@inproceedings{Stutz2018CVPR,
    title = {Learning 3D Shape Completion from Laser Scan Data with Weak Supervision },
    author = {Stutz, David and Geiger, Andreas},
    booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    publisher = {IEEE Computer Society},
    year = {2018}
}

The corresponding master thesis can be found below:

Thesis (∼ 26.7MB)Slides (∼ 10.8MB)

@misc{Stutz2017,
    author = {David Stutz},
    title = {Learning Shape Completion from Bounding Boxes with CAD Shape Priors},
    month = {September},
    year = {2017},
    institution = {RWTH Aachen University},
    address = {Aachen, Germany},
    howpublished = {http://davidstutz.de/},
}

Code and Data

Coming soon!

News & Updates

Apr 12, 2018. The LaTeX source of the CVPR'18 paper and the thesis is now available on GitHub: davidstutz/master-thesis-shape-completion, davidstutz/cvpr2018-shape-completion.

Apr 11, 2018. The CVPR'18 paper and my master thesis are now available as PDF. This project page, as well as the official project page went live.

Method

Figure 1 (click to enlarge): Overview of the proposed, weakly-supervised 3D shape completion approach; see text for details.

We focus on the problem of inferring and completing 3D shapes based on sparse and noisy 3D point observations as illustrated in Figure 1. Existing approaches to shape completion can be into data-driven and learning-based methods. The former usually rely on learned shape priors and formulate shape completion as optimization problem over the corresponding (lower-dimensional) latent space [][][]. Learning-based approaches, in contrast, assume a fully supervised setting in order to directly learn shape completion on synthetic data [][][][][][]. In the paper, we tackle the problems of both approaches -- the optimization problem of data-driven approaches and the required supervision of learning-based approaches -- in order to combine their strengths -- applicability to real data and efficient inference.

To this end, we propose an amortized maximum likelihood approach for 3D shape completion. More specifically, we first learn a shape model on synthetic data using a variational auto-encoder (Figure 1, step 1). Shape completion can then be formulated as maximum likelihood problem -- in the spirit of []. Instead of maximizing the likelihood independently for distinct observations, however, we follow the idea of amortized inference and learn to predict the maximum likelihood solutions directly given the observations. Towards this goal, we train a new encoder which embeds the observations in the same latent space using an unsupervised maximum likelihood loss (Figure 1, step 2). This allows us to learn 3D shape completion in challenging real-world situations, e.g., on KITTI []. For experimental evaluation, we introduce two novel, synthetic shape completion benchmarks based on ShapeNet [] and ModelNet []. On KITTI, we further compare our approach to the work of Engelmann et al. [] -- the only related work which addresses shape completion on KITTI. Our experiments demonstrate that we obtain shape reconstructions which rival data-driven techniques while significantly reducing inference time.

Experiments

Figure 2 (click to enlarge): Qualitative results on ShapeNet and KITTI, comparing against [] as well as a maximum likelihood (ML) baseline and a fully-supervised baseline (Sup). The proposed approach, amortized maximum likelihood is abbreviated as AML.

Figure 3 (click to enlarge): Qualitative results on ModelNet, where our AML model is trained on all ten classes. We compare against our supervised baseline (Sup) on occupancy grids only.

Figures 2 and 3 show qualitative results of the proposed amortized maximum likelihood (AML) approach to shape completion. We compare against a maximum likelihood (ML) baseline, [] and a supervised baseline (Sup) — see the paper for details. The qualitative results illustrate that our approach is competitive to fully-supervised approaches while only using a fraction of the supervision. Similarly, we show that AML is able to generalize across object categories and is able to correctly identify object categories based on sparse observations. Additionally, our approach is shown to outperform [] while being significantly faster.

Conclusion

We presented a weakly-supervised, learning-based approach to 3D shape completion. After using a variational auto-encoder to learn a shape prior on synthetic data, we formulated shape completion as maximum likelihood problem. We fixed the learned generative model and trained a new, deterministic encoder to amortize the ML problem. This encoder can be trained in an unsupervised fashion. On ShapeNet and ModelNet, we demonstrated that our approach outperforms a state-of-the-art data-driven method [] (while significantly reducing runtime) and generalizes across object categories. We also showed that it is able to compete with the fully-supervised model both quantitatively and qualitatively while using 9% or less supervision. We also demonstrated the applicability to real data on KITTI. Overall, our experiments demonstrate the benefits of the proposed approach: reduced runtime compared to data-driven approaches and training on unlabeled, real data compared to learning-based approaches.

Acknowledgements

As this work was part of a master thesis, written at the Autonomous Vision Group at Max Planck Institute for Intelligent System, I want to thank Prof. Andreas Geiger for making this work possible. From RWTH Aachen University, Prof. Bastian Leibe supported this thesis.

We also thank Francis Engelmann for help with [].

References

  • [] S. Bao, M. Chandraker, Y. Lin, and S. Savarese. Dense object reconstruction with semantic priors. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2013.
  • [] A. Dame, V. Prisacariu, C. Ren, and I. Reid. Dense reconstruction using 3D object shape priors. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2013.
  • [] F. Engelmann, J. St ̈uckler, and B. Leibe. Joint object pose estimation and shape reconstruction in urban street scenes using 3D shape priors. In Proc. of the German Conference on Pattern Recognition (GCPR), 2016.
  • [] A. Dai, C. R. Qi, and M. Nießner. Shape completion using 3d-encoder-predictor cnns and shape synthesis. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2017.
  • [] H. Fan, H. Su, and L. J. Guibas. A point set generation network for 3d object reconstruction from a single image. arXiv.org, abs/1612.00603, 2016.
  • [] D. J. Rezende, S. M. A. Eslami, S. Mohamed, P. Battaglia, M. Jaderberg, and N. Heess. Unsupervised learning of 3d structure from images. arXiv.org, 1607.00662, 2016.
  • [] G. Riegler, A. O. Ulusoy, H. Bischof, and A. Geiger. OctNetFusion: Learning depth fusion from data. In Proc. of the International Conf. on 3D Vision (3DV), 2017.
  • [] A. Sharma, O. Grau, and M. Fritz. Vconv-dae: Deep volumetric shape learning without object labels. arXiv.org, 1604.03755, 2016.
  • [] E. Smith and D. Meger. Improved adversarial systems for 3d object generation and reconstruction. arXiv.org, abs/1707.09557, 2017.
  • [] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2012.
  • [] A. X. Chang, T. A. Funkhouser, L. J. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu. Shapenet: An information-rich 3d model repository. arXiv.org, 1512.03012, 2015.
  • [] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 3d shapenets: A deep representation for volumetric shapes. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2015.

What is your opinion on this article? Did you find it interesting or useful? Let me know your thoughts in the comments below or get in touch with me: