# DAVIDSTUTZ

Meet me at CVPR'18: Tuesday, June 19th, I will be presenting our work on weakly-supervised 3D shape completion.
08thMAY2017

Zeeshan Hayder, Xuming He, Mathieu Salzmann. Shape-aware Instance Segmentation. CoRR, 2016.

Hayder et al. introduce a framework for instance-level semantic segmentation called Shape-Aware Instance Segmentation (SAIS) network. The main idea is to combine the Region Proposal Network (RPN) of [] with the newly proposed Object Mask Network (OMN). The motivation is to allow networks to generate good instance-level segmentations based on bounding boxes that do not fully cover the object. This motivation is illustrated in Figure 1.

Object Mask Networks (OMNs) are generally regular neural networks predicting what Hayder et al. call a shape-aware mask representation. In particular, given a bounding box, they do not predict a binary mask, but a per-pixel distance transform (as illustrated in Figure 1). A distance transform encodes the Euclidean distance to the nearest (object) boundary pixel in each pixel. Hayder et al. additionally cap the distance transform value with a maximum value of $R$. Then, these values are quantized into $K$ values - i.e. the distance transform values are represented by $K$-dimensional binary vectors such that the distance $D(p)$ for pixel $p$ can be expressed as

$D(p) = \sum_{n = 1}^K r_n\dot b_n(p)$, $\sum_{n = 1}^K b_n(p) = 1$

where the $b_n$ correspond to a one-hot binary vector that is predicted by the network. Given the distance transform per pixel. It is easy to obtain the final object mask by placing a disk of radius $D(p)$ at pixel $p$ and taking the union over these disks. Luckily, this operation can be expressed as convolution enabling the integration into the overall network structure, see the paper for details.

An OMN takes proposals from a Region Proposal Network, warps the corresponding features and predicts $K$ feature maps corresponding to the $b_1,\ldots,b_K$. These feature maps are then fed into a deconvolution model transforming them into a binary mask as explained in the paper (basically expressing the idea with the disks at every pixel location in terms of network layers).

The overall SAIS network puts a one-layer classifier on top of the OMN based on the binary mask and the bounding box features from the RPN. The full network architecture is then illustrated in Figure 2.

They proof the effectiveness of the proposed model on PASCAL VOC 2012 and Cityscapes. Some qualitative results are shown in Figure 3. For quantitative results and a comparison to state-of-the-art techniques see the paper.

• [] ] P. O. Pinheiro, R. Collobert, P. Dollar. Learning to segment object candidates. NIPS, 2015

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below or get in touch with me: