A. Geiger, P. Lenz, R. Urtasun. Are we ready for autonomous driving? The KITTI vision benchmark. Conference on Computer Vision and Pattern Recognition, 2012.

Geiger et al. introduce the KITTI dataset including benchmarks for common vision tasks important for autonomous driving and advanced driver assistance systems: stereo, optical flow, 2D/3D object detection and visual SLAM. As such, KITTI was the first dataset to provide (mostly) dense ground truth for stereo and optical flow on realistic urban scenes. This was achieved using the mounted Velodyne sensor (in particular a Velodyne HDL-64E - also check the videos provided at the bottom of the link showing some cool applications including Google's Self-Driving Car), see Figure 1. Localization ground truth was obtained using a GPS/IMU unit and object ground truth was obtained using manual annotation (in 2D and 3D).

kitti_sensors velodyne velodyne_example_2

Figure 1 (click to enlarge): Mounted sensors for data acquisition (left), Velodyne HDL-64E (middle), data example from Velodyne HDL-64E (right) taken from here.

To date, KITTI is one of the largest and most commonly used datasets for many of the mentioned tasks. For object detection, in particular, KITTI is - beneath the Caltech dataset [1] - an important dataset for autonomous driving applications. Geiger et al. also give a thorough overview of related datasets (see Table 1 in the paper). The benchmarks on the dataset's webpage reflect the current state-of-the-art in the individual tasks. According to these numbers, KITTI does not seem to be "solved" completely. Especially concerning runtime, most submitted algorithms are not ready for autonomous driving.

  • [1] P. Doll├ír, C. Wojek, B. Schiele, P. Perona. Pedestrian Detection: An Evaluation of the State of the Art. Pattern Analysis and Machine Intelligence, 2012
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.