"A Point Set Generation Network for 3D Object Reconstruction from a Single Image", Fan et al. • David Stutz

OCTOBER2017

READING

Haoqiang Fan, Hao Su, Leonidas J. Guibas. A Point Set Generation Network for 3D Object Reconstruction from a Single Image. CoRR abs/1612.00603, 2016.

Fan et al. introduce point set generating networks – closely related and based on the PointNet idea []. Tackling the problem of single-image 3D reconstruction, they make two major contributions: defining and discussing suitable reconstruction losses allowing to compare two point clouds; and extending the chosen loss to account for uncertainty. In general, they consider a model of the form

$S = G(I, r; \theta)$

where $S$ is the predicted point cloud, $I$ the input image (e.g. with depth) and $r$ a random variable perturbing the input (e.g. $r \sim \mathcal{N}(0,1)$). The vanilla (baseline) model they propose is illustrated in Figure 1.

Figure 1: Vanilla architecture consisting of a convolutional encoder, and a predictor, which essentially is a PointNet [].

Regarding the loss, they propose both the Chamfer distance and the Earth Mover Distance:

$D_{CD}(S_1, S_2) = \sum_{x \in S_1} \min_{y \in S_2} \|x – y\|_2^2 + \sum_{y \in S_2} \min_{x \in S_1} \|x – y\|_2^2$

$D_{EMD}(S_1, S_2) = \min_{\phi} \sum_{x \in S_1} \|x - \phi(x)\|_2$

where, for the Earth Mover Distance, $\phi$ is a bijection between the two point sets which essentially solves the assignment problem. For this, they use an approximation for efficiency.

However, the uncertainty (also modeled through the random variable $r$) is not taken into account. Therefore, they adapt the loss to state the overall optimization problem over the parameters $\theta$ of the model as

$\min_\theta \sum_k \min_{r_j \sim N(0,1), 1 \leq j \leq n} \{d(G(I_k, r_j;\theta), S_k)\}$

where $S_k$ is the ground truth corresponding to image $I_k$. The loss is called the Min-of-N loss as it considers the minimum of $n$ randomized predictions.

They provide experimental results on various tasks, including shape completion from RGBD images where qualitative results can be found din Figure 2.

Figure 2: Qualitative results for the task of shape completion.

[] Charles Ruizhongtai Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. CoRR abs/1612.00593, 2016.

What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.

IAM

DAVIDSTUTZ

READING

SEARCHTHEBLOG

ARCHIVES

TAGS