Two Cheap Cameras Can Provide LiDAR-like Object Detection for Self-Driving

Cornell researchers can use two inexpensive cameras on either side of the windshield to detect objects with nearly LiDAR’s accuracy and at a fraction of the cost. Currently, the LiDAR implementation needs three devices and a cost of $10,000 per car.


* This will make self-driving cars cheaper and it will not need cheaper LiDAR
* This will help Tesla which is working with eight cameras and does not use LiDAR
* Data should not be fed as is to artificial intelligence machine learning with the assumption that the AI will still figure it out. Changing the perspective was key to the improvement.


Researchers found that analyzing the captured images from a bird’s-eye view rather than the more traditional frontal view more than tripled their accuracy, making stereo camera a viable and low-cost alternative to LiDAR.

LiDAR sensors use lasers to create 3D point maps of their surroundings, measuring objects’ distance via the speed of light. Stereo cameras work like human eyes by combining two perspectives. Stereo camera accuracy in object detection has been low and it was believed that they were and would remain too imprecise.

Cornell researchers Wang and collaborators took a closer look at the data from stereo cameras. To their surprise, they found that their information was nearly as precise as LiDAR. The gap in accuracy emerged, they found, when the stereo cameras’ data was being analyzed.

For most self-driving cars, the data captured by cameras or sensors is analyzed using convolutional neural networks – a kind of machine learning that identifies images by applying filters that recognize patterns associated with them. These convolutional neural networks have been shown to be very good at identifying objects in standard color photographs, but they can distort the 3D information if it’s represented from the front. So when Wang and colleagues switched the representation from a frontal perspective to a point cloud observed from a bird’s-eye view, the accuracy more than tripled.

Future Work

There are multiple immediate directions along which our results could be improved in future work:
1. higher resolution stereo images would likely significantly improve the accuracy for faraway objects. The results were obtained with 0.4 megapixels.

2. They did not focus on real-time image processing and the classification of all objects in one image takes on the order of 1s. However, it is likely possible to improve these speeds by several orders of magnitude. Recent improvements on real-time multi-resolution depth estimation show that an effective way to speed up depth estimation is to first compute a depth map at low resolution and then incorporate high-resolution to refine the previous result. Nextbigfuture notes that Tesla full self driving computer has 144 teraOPS (144 trillion operations per second) of processing power.

3. The conversion from a depth map to pseudo-LiDAR is very fast and it should be possible to drastically speed up the detection pipeline through e.g. model distillation or anytime prediction.

4. State-of-the-art in 3D object detection could be improved through sensor fusion of LiDAR and pseudo-LiDAR. Pseudo-LiDAR has the advantage that its signal is much denser than LiDAR and the two data modalities could have complementary strengths. This will revive image-based 3D object recognition. The progress will motivate the computer vision community to fully close the image/LiDAR gap in the near future.

Arxiv – Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving.

3D object detection is an essential task in autonomous driving. Recent techniques excel with highly accurate detection rates, provided the 3D input data is obtained from precise but expensive LiDAR technology. Approaches based on cheaper monocular or stereo imagery data have, until now, resulted in drastically lower accuracies — a gap that is commonly attributed to poor image-based depth estimation. However, in this paper, we argue that data representation (rather than its quality) accounts for the majority of the difference. Taking the inner workings of convolutional neural networks into consideration, we propose to convert image-based depth maps to pseudo-LiDAR representations — essentially mimicking LiDAR signal. With this representation, we can apply different existing LiDAR-based detection algorithms. On the popular KITTI benchmark, our approach achieves impressive improvements over the existing state of-the-art in image-based performance — raising the detection accuracy of objects within 30-meter range from the previous state-of-the-art of 22% to an unprecedented 74%. At the time of submission, our algorithm holds the highest entry on the KITTI 3D object detection leaderboard for stereo image-based approaches.

SOURCES- Cornell University, Arxiv
Written By Brian Wang,

29 thoughts on “Two Cheap Cameras Can Provide LiDAR-like Object Detection for Self-Driving”

  1. But it must also be true that you can resolve distances via stereo vision *coarsely* at fairly large distances.

    Say you have two cameras 1 m apart, and that you are using a HD cameras. Each camera has, say, 50 degrees vision which equates to about 1/40:th degree resolution. This would mean resolving a pixel at 1 km from a pixel at 1.8 km.
    tan^-1(1/1000) = theta = 0.057 deg;
    tan (theta – 1/40) = tan(0.032) = 0.00056
    => dist = 1.77 km)

    Now, few objects consists of just one pixel… The angular resolution for an object that consists of 100 pixels should be – handwaving – about 100 times better, i.e. 1/4000 degrees resolution. This equates to resolving an object at 1 km distance from an object at 1.004 km distance, i.e. a depth resolution of 4m at 1 km.

    Now 4 m resolution at 10 m distance is bad, but I would argue that at 1 km, it’s just fine. Particularly as you can combine several sequential (pairs of) images that each have 4 m resolution to yield even better results.

  2. That’s some interesting information. I have to say that my subjective impression, just shutting one eye while driving, is that two eyes give me better depth perception at least 50 feet out. But, of course, I’m not totally lacking in depth clues at that point, and it’s a subjective impression.

    Two feet apart would be about 16 times greater separation, of course, and you could probably get out to 4-5 feet on a car without changes to the overall footprint of the vehicle.

  3. I have a FLIR in one of my cars, and can detect out to about 400m. It has saved my butt many times, e.g., deer/moose at night. The car also has all sorts of other driver assist tech, radar etc. Next year’s model I’ll be able to get the LiDAR option too (Magna+Innoviz). The expectation is that in 2-3 years all this tech will be cost effective at the $30k/vehicle price point.

    So the tech is out there, and knowing how car companies excel at making zillions of things cost effectively, the tech will be available for all. But – the problem still remains. The tech accomplishes the 95/5 rule in covering most potential accidents (notwithstanding alcohol related). The remaining 5% is really, really hard and comprises 95% of the cost in order to reach acceptable autonomous driving. It isn’t the hardware tech, really. It is a combination of software and adoption rates. Current “AI” software design is not at all adequate, neither artificial nor intelligent. There is a very long way to go to reach adoption critical mass.

  4. A friend of mine did his masters on distance perception in machine vision and he tells me that the famous human binocular vision really only works out to a couple of metres. i.e. About what we could realistically hope to reach.

    Beyond that we actually get our distance perception from other cues

    • Comparison to other objects whose size and position we already know (or think we know)
    • Angle changes as our head changes position
    • Just guessing based on what size we think an object is

    As a result people with only one (working) eye are far more capable at navigating the world than you would naively assume just from the binocular theory.

    Having lost the sight in my left eye for a couple of months I can confirm this matches my experience. I did experience problems and misjudgements, but they were only with close things. I’d have trouble grabbing a rice cooker, but no issue driving or riding my bike. AND I would have trouble grabbing a rice cooker, but not my coffee cup. I think that’s because I knew very well what the size of the cup was so I could guess distance easily, but the rice cooker was something I only used once a month or so.

    (Off topic, but the rice cooker springs to mind because of another problem. I was looking for the rice cooker, opened a cupboard, “There it is.” Go to grab it… and found myself trying to grab the photograph on the front of the rice cooker box. I was fooled by the 2D photo because I was only seeing in 2D. )

  5. Not really; Humans do an adequate job of resolving distance using eyes placed only a couple inches apart. A couple feet would be enormously better, and you could fit that onto a self-driving motorcycle without changing the form factor.

    You don’t really need to resolve distances down to millimeters for driving purposes.

  6. The added benefit of Thz scanning tech is that, if your car DID hit a pedestrian, it could diagnose their broken bones!

  7. bug splats are a serious problem, human car drivers have a windows distance unlike camera’s who usually sit at front, and thus can be blinded with a single bug splat or bird splash.

  8. I nearly did a search to see if there was any existing images of “Robot hammerhead shark girlfriend” but I’m wise enough that I know I really don’t want to know.

  9. The biggest bang for the buck would easily be better driver training (simulations & track), and increasing driver awareness of conditions (better visibility, mirrors, lighting, etc.).

    But that’s commonsense, and unsexy, so instead we get Idiocracy-tier crap.

  10. Evidently we’ve learned nothing from the 737MAX tragedy about the dangers of increasing systems/automation complexity in an attempt to remove the human operator from the system.

    I thought we had reached peak Clown World, but it’s clear we have a ways to go.

  11. Actually the main reason to have many cameras would be to handle poo, bug strikes, equipment damage.

    Two cameras means that when one goes down then the car can’t self drive and a human who may not have controlled a car in a year or so takes over.

  12. I always wonder when up-armored HMMRs are going to ditch the armored glass windshield (to be replaced with sloped armor) and the driver drives with a headset on and has 360 degree vision.

    Semi’s too I suppose.

  13. People pay more for safety. Some people won’t entrust their life and the life of their family members to things that can be fooled with single points of failure.

    The promise of self driving cars is always that they are far superior to normal human drivers.

    By means of comparison in the last three decades I have driven almost 800,000 miles and have caused no accidents.

  14. “But then also, why should it just be LIDAR vs Vision – why not both?”

    Totally agree. Throw in radar and you are getting good. Ideally I would like to augment visible light vision with infra-red vision. Visible light machine vision has all the same problems visible light human vision had (rain, fog, snow, sand).

  15. Antennae that extend up and out in town, and retract at highway speeds, would optimise both modes. If they pulled right in in park, they’d be safe from vandalism and bird poo too.

  16. Good news is that the android girlfriends of the future will be able to make you a coffee in the kitchen without knocking everything over.
    Bad news is she’ll look like a hammerhead shark.
    Good news is the alternative was an 8 eyed spider face.

  17. But LIDAR doesn’t have to stay expensive – and when human lives are involved, the need for redundancy becomes critical. There should be an effort to include both, even if cost reductions have to be developed.

  18. You’re right – one could conceivably even have some kind of mast extensions to project the eyes upwards (think of the eye-stalks on crabs), while also serving as communication antennae.

  19. I think you could put cameras on the side mirrors as an extreme. Doubtful it would be worth the efficiency loss to make a larger cross-section.

  20. This is where the engineer brings up the budget. If we are designing a lunar lander, you would be correct. Few expenses should be spared. But $10k for lidar is non-trivial in the context of a car, especially when it bring little incremental value over cheap vision+AI.

  21. But then also, why should it just be LIDAR vs Vision – why not both? One could imagine circumstances where LIDAR (of the right wavelength) could be superior to vision – nighttime, blizzard, heavy rain, fog, spray from running over water puddles, etc. There’s no inherent reason for things to be mutually exclusive – redundancy is better.

  22. Agree on the big rigs. I drove one professionally for almost a decade. They have many blind spots.

  23. So it sounds like the greater the separation distance (ie. the greater the perspective change), the better the accuracy. So the larger the frontal cross-section of the vehicle, the better vision is possible. Will this factor have a fundamental influence on the shape of self-driving cars? Perhaps a tall-boy design will make a comeback (despite higher risk of vehicle rollover). Maybe big-rig semi trucks will benefit the most.

  24. Cameras will only get smaller and more numerous. They will have a LOT more than 2 per car. It will probably be more like an flat array of cameras on each side of the car.

  25. Nothing new here.

    You should really have more than two cameras. Say six:

    Left & Right headlight, each corner of the windshield.

    More cameras don’t really cost that much more and you really want redundancy. Bug splats, dust, flaky cameras, etc.

Comments are closed.