Check out our latest research on weakly-supervised 3D shape completion.
02ndFEBRUARY2018

Considerable effort must have gone into setting up the dataset and evaluation pipeline. They sample 3D keypoint correspondences from 3D Harris points in 3D scenes captured e.g. with Microsoft’s Kinect or Asus’ Xtion. To obtain ground truth, different viewpoints and video trajectories from the same scene are aligned using recent results in reconstruction [9]. Using this scheme, they are able to generate a large dataset for learning. They use a truncated distance field representation for the 3D volumes; after obtaining two matching (or non-matching keypoints), $31 \times 31 \times 31$ volumes are extracted which are fed to the feature learning network. These volumes correspond roughly to a 15cm vicinity of the keypoints.