Yukun Zhu, Raquel Urtasun, Ruslan Salakhutdinov, Sanja Fidler. segDeepM: Exploiting segmentation and context in deep neural networks for object detection. CVPR, 2015.
Zhu et al. show that object detection benefits from using object segmentation proposals and segmentation-based features as well as larger context. Their work is largely based on ideas of [8] and [9]. In particular, they propose three important improvements and experimentally show their effectiveness:
Using a markov random field, their model reasons jointly over segmentations (obtained from CPMC [3]) and candidate bounding boxes. The segmentations are also used to compute features.
Use a larger context for bounding box candidates by enlarging the original bounding box by a fixed percentage.
Iteratively refine the bounding boxes by repeatedly performing bounding box prediction on the final candidate set. After each prediction set, the features are re-computed if individual bounding boxes changed significantly.
Table 1 shows experimental results demonstrating the applicability of these three improvements over the baseline model R-CNN. While the appearance features and context features (using the enlarged bounding box) are computed using pre-trained neural networks, specifically AlexNet [15], the segmentation features are mostly hand-crafted. It would be interested to see whether it is possible to utilize pre-trained networks for the segmentation-based features, as well. Figure 1 additionally shows detection results and the corresponding segmentations selected by their approach.
Table 1: Accuracy for individual classes as well as overall mAP showing the improvement of the discussed techniques over R-CNN. Here, seg refers to using the segmentation features, exp to the enlarged bounding boxes/context features, ibr to iterative bounding box refinment and br to regular bounding box refinement (i.e. once).
Figure 1: Qualitative results showing detection and recognition as well as the corresponding segmentations. From left to right: ground truth, R-CNN, the proposed approach, segments selected by the proposed approach.
[3] J. Carreira and C. Sminchisescu. Constrained parametric min-cuts for automatic object segmentation. CVPR, 2013.
[8] S. Fidler, R. Mottaghi, A. Yuille, and R. Urtasun. Bottom-up segmentation for top-down detection. CVPR, 2013.
[9] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv:1311.2524, 2013.
[15] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. NIPS, 2012.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.
Zhu et al. show that object detection benefits from using object segmentation proposals and segmentation-based features as well as larger context. Their work is largely based on ideas of [8] and [9]. In particular, they propose three important improvements and experimentally show their effectiveness:
Table 1 shows experimental results demonstrating the applicability of these three improvements over the baseline model R-CNN. While the appearance features and context features (using the enlarged bounding box) are computed using pre-trained neural networks, specifically AlexNet [15], the segmentation features are mostly hand-crafted. It would be interested to see whether it is possible to utilize pre-trained networks for the segmentation-based features, as well. Figure 1 additionally shows detection results and the corresponding segmentations selected by their approach.
Table 1: Accuracy for individual classes as well as overall mAP showing the improvement of the discussed techniques over R-CNN. Here, seg refers to using the segmentation features, exp to the enlarged bounding boxes/context features, ibr to iterative bounding box refinment and br to regular bounding box refinement (i.e. once).
Figure 1: Qualitative results showing detection and recognition as well as the corresponding segmentations. From left to right: ground truth, R-CNN, the proposed approach, segments selected by the proposed approach.