Two Cheap Cameras Can Provide LiDAR-like Object Detection for Self-Driving

Cornell researchers can use two inexpensive cameras on either side of the windshield to detect objects with nearly LiDAR’s accuracy and at a fraction of the cost. Currently, the LiDAR implementation needs three devices and a cost of $10,000 per car.


* This will make self-driving cars cheaper and it will not need cheaper LiDAR
* This will help Tesla which is working with eight cameras and does not use LiDAR
* Data should not be fed as is to artificial intelligence machine learning with the assumption that the AI will still figure it out. Changing the perspective was key to the improvement.


Researchers found that analyzing the captured images from a bird’s-eye view rather than the more traditional frontal view more than tripled their accuracy, making stereo camera a viable and low-cost alternative to LiDAR.

LiDAR sensors use lasers to create 3D point maps of their surroundings, measuring objects’ distance via the speed of light. Stereo cameras work like human eyes by combining two perspectives. Stereo camera accuracy in object detection has been low and it was believed that they were and would remain too imprecise.

Cornell researchers Wang and collaborators took a closer look at the data from stereo cameras. To their surprise, they found that their information was nearly as precise as LiDAR. The gap in accuracy emerged, they found, when the stereo cameras’ data was being analyzed.

For most self-driving cars, the data captured by cameras or sensors is analyzed using convolutional neural networks – a kind of machine learning that identifies images by applying filters that recognize patterns associated with them. These convolutional neural networks have been shown to be very good at identifying objects in standard color photographs, but they can distort the 3D information if it’s represented from the front. So when Wang and colleagues switched the representation from a frontal perspective to a point cloud observed from a bird’s-eye view, the accuracy more than tripled.

Future Work

There are multiple immediate directions along which our results could be improved in future work:
1. higher resolution stereo images would likely significantly improve the accuracy for faraway objects. The results were obtained with 0.4 megapixels.

2. They did not focus on real-time image processing and the classification of all objects in one image takes on the order of 1s. However, it is likely possible to improve these speeds by several orders of magnitude. Recent improvements on real-time multi-resolution depth estimation show that an effective way to speed up depth estimation is to first compute a depth map at low resolution and then incorporate high-resolution to refine the previous result. Nextbigfuture notes that Tesla full self driving computer has 144 teraOPS (144 trillion operations per second) of processing power.

3. The conversion from a depth map to pseudo-LiDAR is very fast and it should be possible to drastically speed up the detection pipeline through e.g. model distillation or anytime prediction.

4. State-of-the-art in 3D object detection could be improved through sensor fusion of LiDAR and pseudo-LiDAR. Pseudo-LiDAR has the advantage that its signal is much denser than LiDAR and the two data modalities could have complementary strengths. This will revive image-based 3D object recognition. The progress will motivate the computer vision community to fully close the image/LiDAR gap in the near future.

Arxiv – Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving.

3D object detection is an essential task in autonomous driving. Recent techniques excel with highly accurate detection rates, provided the 3D input data is obtained from precise but expensive LiDAR technology. Approaches based on cheaper monocular or stereo imagery data have, until now, resulted in drastically lower accuracies — a gap that is commonly attributed to poor image-based depth estimation. However, in this paper, we argue that data representation (rather than its quality) accounts for the majority of the difference. Taking the inner workings of convolutional neural networks into consideration, we propose to convert image-based depth maps to pseudo-LiDAR representations — essentially mimicking LiDAR signal. With this representation, we can apply different existing LiDAR-based detection algorithms. On the popular KITTI benchmark, our approach achieves impressive improvements over the existing state of-the-art in image-based performance — raising the detection accuracy of objects within 30-meter range from the previous state-of-the-art of 22% to an unprecedented 74%. At the time of submission, our algorithm holds the highest entry on the KITTI 3D object detection leaderboard for stereo image-based approaches.

SOURCES- Cornell University, Arxiv
Written By Brian Wang,