Geiger et al. introduce the KITTI dataset including benchmarks for common vision tasks important for autonomous driving and advanced driver assistance systems: stereo, optical flow, 2D/3D object detection and visual SLAM. As such, KITTI was the first dataset to provide (mostly) dense ground truth for stereo and optical flow on realistic urban scenes. This was achieved using the mounted Velodyne sensor (in particular a Velodyne HDL-64E - also check the videos provided at the bottom of the link showing some cool applications including Google's Self-Driving Car), see Figure 1. Localization ground truth was obtained using a GPS/IMU unit and object ground truth was obtained using manual annotation (in 2D and 3D).
Figure 1 (click to enlarge): Mounted sensors for data acquisition (left), Velodyne HDL-64E (middle), data example from Velodyne HDL-64E (right) taken from here.
To date, KITTI is one of the largest and most commonly used datasets for many of the mentioned tasks. For object detection, in particular, KITTI is - beneath the Caltech dataset  - an important dataset for autonomous driving applications. Geiger et al. also give a thorough overview of related datasets (see Table 1 in the paper). The benchmarks on the dataset's webpage reflect the current state-of-the-art in the individual tasks. According to these numbers, KITTI does not seem to be "solved" completely. Especially concerning runtime, most submitted algorithms are not ready for autonomous driving.
What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below or using the following platforms: