Karpathy Talks Tesla FSD Technical Details

David Lee summarized and analyzed Andrej Karpathy recent technical presentation on Tesla FSD.

Andrej indicated that Tesla already has over 1.8 Exaflop of processing power with 10 Petabytes of storage for its AI training computer. This is the training supercomputer that Tesla is using before the full Dojo system is set up.

Andrej also indicated that Tesla uses 221 event triggers for FSD.

Tesla has spent 4 months solving depth analysis using only vision. They did this using a massive data set.

Tesla has a large, diverse and clean data set for FSD. You want a large amount of edge cases for training and this is where the 221 triggers were used to capture edge cases.

Tesla has captured and is using over one million edge case video clips.

They run the system through shadow mode many times. They can then test new software against the million edge cases to see if handling is being improved.

Vision the only scalable low-cost solution for self-driving.

SOURCES- Tesla, David Lee Investing, Tesla Daily
Written By Brian Wang, Nextbigfuture.com (Brian owns shares of Tesla)

35 thoughts on “Karpathy Talks Tesla FSD Technical Details”

  1. If by military you mean insurgents in the middle east using swarming drones, like personal use DJI quadcopters rigged to drop grenades and mortars, then the swarming drone revolution is happening now. It's highly lethal and very effective against ground troops, thus holding off invading forces who don't equip RF jammers, as you need boots on the ground to occupy a place. Even that doesn't work entirely for drones that can navigate by dead reckoning and using video camera ground recognition to navigate in GPS and remote control denied environments.

    Reply
  2. The problem is the V2V shared sense data itself. Not Tesla cars per se, but the sense data source could be a real car sending incorrect data (incorrectly interpreted environment), a car purposely sending bad sense data, or some cellphone with a dongle spoofing a car sending bad data. Either way, the receiving car has to make a trust judgement on sense data of environments it can't see itself that has consequences on path and speed planning.

    If you have hard coded certificates for every vehicle (think digitally stamped VIN), you have no privacy in V2V unless extreme efforts are made. Though that would prevent spoofing mostly. Doesn't solved the misinterpreted environment problem.

    Reply
  3. Tesla is already having people find and report hacks that give them control over installed features. https://electrek.co/2020/08/27/tesla-hack-control-over-entire-fleet/
    What you are expressing is a concern that someone could upload an unauthorized software change to a number of cars that would falsify data being sent V2V, without the change affecting or being recognized by the host vehicles. That isn't theoretically impossible, but it is an order of magnitude harder to write valid code than to exploit an existing bug. Disgruntled employees would be among the few who might be able to do it, assuming they could check the changes into Tesla's own code repository. Otherwise you need another open channel that accepts and installs the software updates without validating authenticity. Now you have significant infrastructure requirements. I'd be more concerned about identifying an industrial/political competitor who has the resources to create this software and infrastructure than about someone sitting in their basement writing code for kicks. Of course, once it's developed and released as a software kit, then the basement coders can create their own tricks.

    Reply
  4. You're correct, but the military has not taken drone warfare to it's ultimate conclusion. It in particular, has not taken advantage of the cost reductions available in replacing leading edge weapons platforms, with orders of magnitude larger fleets of nearly as capable weapons platforms.

    Reply
  5. If rioters can put on masks and goggles in preparation for tear gas they they can wear long sleeves, jeans and a face cover in preparation for a microwave pain ray.

    Sure the casual spur-of-the-moment trouble makers aren't going to be prepared, but they probably aren't the ones to drag you out of a car and physically damage/violate you anyway.

    Reply
  6. Aren't you just describing drones, which have been the cool new thing in military hardware for a couple of decades now?

    Reply
  7. But then LIDAR doesn't force you to use the fully-mapped-planet approach, it mere allows you to use the fully-mapped-planet approach. Nothing to stop you using other approaches such as the one that is proposed for vision.

    I agree that relying on your system to have a complete and up to date map of the planet is a fool's errand. Especially as road systems can change within minutes.

    Reply
  8. I've read about cases where a Tesla braked because its forward radar bounced a signal under the car in front of it, and picked up that the car in front of that had suddenly braked. That was something the driver couldn't see.

    This is one capability they'll lose by eliminating radar.

    Reply
  9. Consider the effect of this "FSD" hardware on military doctrine. One of the reasons military equipment is so absurdly expensive, is to give the operators a better chance of survival.
    Presumably, there's a sweet spot between top end equipment, and hundred dollar flying bombs. Not having to worry about the death's, and replacement of your personell lets you find, and use it.
    I'm thinking something about the performance of 1970s equipment, value engineered to lower manufacturing costs, designed to come home and be serviced/reloaded, and run another sortie. Instead of one $13 megabuck Apache attack helicopter, you get between 13, and 26 Huey Super Cobra equivalent helicopters, and they are more politically correct to boot! You don't even have to carry around all that hardware to accommodate the pilot.
    Maybe 4, or 5 M-60 equivalent main battle tanks with much better targeting software/hardware for one late model Abrams? Just consider the cost difference between a Diesel engine, and a gas turbine engine.
    A huge flotilla of PT-boat class hydrofoils that would ride on a commandeered container ship, used as a "PTcarrier", instead of any capital ship, even a destroyer? Once again, compare the price of an Otto, or Diesel cycle engine compared to the ship prime movers. Think of the communications redundancy you'd have with all those mesh network nodes. The enemy would have trouble blocking communications.

    Reply
  10. Here's an idea. Vehicle radars should have a "pedestrian danger mode(PDM)" In this mode the wavelength used by military denial of area "pain rays" that produce the sensation of intense heat on any skin not covered by an electrical conductor would be emitted In this mode, the phased array would shape it's beam towards human sized roadway obstructions, that did not strongly reflect, and with a preference for obstruction moving towards the vehicle. That would move those BLM/ANTIFA chuckleheads out of your way.
    You could claim it was a safety feature, and that that the "small" amount of pain could save a life, sort of like appendectomies. PDM refers to lessening danger for pedestrians. How could the NTHSA say no?
    As a bonus, it might be amusing to chase them around with your car. Don't want to be irradiated? Then take your violent virtue signaling off the roadway!

    Reply
  11. I guess portable IT would not be affected. If electronics are a few centimeters from an active antenna, they must be relatively microwave immune. If it handles a wi-fi card, I guess your desktop is OK too.

    Reply
  12. Yes, the need for a sanity check would decline as the algorithms were proven out. That might take a while, though.

    Reply
  13. Nothing. By the time the radar intensity got anywhere near the level where it might produce health effects, you'd be pouring so much power into radar that people would be complaining about it cutting down their range on a charge.

    Reply
  14. But if you're doing that, your lidar is nothing but a sanity check for the real algorithm which is the vision, and as the reliability of the main algorithm increases, the sanity check becomes less and less useful.

    Particularly if the lidar isn't some sort of phased array which can be aimed in microseconds so it can check any given point in front of the car rather than a point at a predetermined height as it is today.

    Reply
  15. What are the health effects of hundreds of cars with radar on in a city? It seems the safe bet is visual FSD. If the visual becomes confused it can slow or stop the car just like a person would.

    Reply
  16. Even without IR capability, modern digital cameras are effectively night vision gear, capable of generating usable images in light levels you or I would be blind in. You don't need IR for night driving.

    Where IR is handy is driving in fog, because it penetrates much better than visible wavelengths.

    Normal digital image sensors reach significantly into IR and UV, and are filtered to exclude them because focusing gets harder the wider the spectral band you're doing it over. But you can get the RBG filters that allow color discrimination in a different form, that allows for IR vision with a bit poorer color discrimination.

    It's quite possible that Tesla has specified the color filters on their cameras to allow a bit of IR capability.

    Reply
  17. I wouldn't use lidar to navigate in a high resolution map, that's a dead end for many reasons, but mainly because the map can never be up to date enough.

    I'd have it to perform "sanity checks" on the vision system. Don't build up a depth image of the surroundings, just ping each identified object to confirm the distance and relative speed the vision system has derived.

    If they don't match within specified limits, you know you've got problems.

    A lidar capable of that doesn't have to have nearly the capacity of one capable of scanning the entire surroundings.

    Reply
  18. Hahaha R.Kimhi! Even if they fail, why would it be a "colossal landmark"? Because of the 200 USD cost of the FSD computer in each car, or perhaps the few billions of USD plowed down into FSD SW? All of this is pocket change once FSD is "cracked"… And this is true even if some other company does it…

    Reply
  19. Ah, but DrPat, you misunderstand!

    Lidar allows using pre-defined high res maps because you can match your "voxel" image in the car to the high res map easily. This gives you your position in the high res map and hence tells you where you path should take you. The mapping itself could be done by any sensor means, in principle, even though it's probably done with lidar as well.

    Now, the detractors of this solution claims that it's very time consuming to update these high res maps and you could still end up in situations where the maps is outdated. You could also end up in places that are not mapped where you thus cannot use your FSD car.

    Contrast this with a general FSD system that does not use high res mapping. All you need is any coarse map because you will be "seeing" the surroundings and calculating the correct path from "scratch". This means that you can drive in places that have not been mapped at all..

    Reply
  20. It turns out you are correct. Apparently, the cost of lidar is projected to cost about 600 USD by 2024 [1]. And I guess 600 USD is acceptable if it brings about FSD capability, even i a "non-taxi" car. For robotaxis it could cost 10k USD and still be a "bargain".

    It's interesting because it contradicts what James Douma said during his last interview with Warren Reidlich. He thought that lidars were unlikely to become cheap. Well, I guess even he can be wrong occasionally..

    (1)
    https://compoundsemiconductor.net/article/111900/LiDAR_Dropping_Prices_And_Low_Volumes

    Reply
  21. "Andrej indicated that Tesla already has over 1.8 Exaflop of processing power "

    (Has been edited; I was wrong about 1.8 Exaflop).

    It's 1.8 Exaflop at FP16. Typically, supercomputing is specified at FP64. If you want to translate FP16 to FP64, FP64 is 4^2 times "harder", so those 1.8 Exaflop at FP16 would be equivalent to about 110 Petaflop of FP64. Which puts it at about spot 5 in the list of supercomputers [1].

    https://en.wikipedia.org/wiki/TOP500

    Reply
  22. I'm not seeing how the computational approach (predictive, rather than detailed pre-mapping) is dependent on the sensor tech used to create the local instantaneous map.
    Not to mention, it could take a lot more computation to create a local map from visual sensors that have to work out distance from a 2D image, than using sensors that give you that data directly.

    Reply
  23. They are using colour video cameras, eight of them, using a 1280×960×24 bit @36 fps. I suspect that the spectral resolution is essentially the same as any digital camera and the average human eye.

    They said nothing about driving at night, although one might be excused to think that there are night-driving clips within the million or so that they are using to train the FSD net. I guess that if you are liable to drive at night down unmarked roads with your lights busted there might be cause for concern, but down notoriously well-conserved US interstate highways? I can't imagine that they are much better than you or I in that environment today, let alone three years from now.

    Reply
  24. When he says "only scalable low-cost solution" he's not refering to the cost of the sensor, but the (primarily computational) cost of maintaining accurate, inch-scale resolution LIDAR maps of essentially all driveable paths through the world. In this sense, being able to derive good distance metrics and particularly understand "on the fly" which way a car is able to go without relying on precomputing maps is a much, much more scalable solution even if radars were free.

    Reply
  25. The driver in front of me waiving me forward when there's an oncoming car doesn't scale. And he has to do it himself and see the carnage. Few people are sociopathic enough to bring themselves to do that.

    To do it at a remove, when they don't have to watch the blood be shed? More people are THAT sociopathic, and like I said, spoofing can scale.

    Reply
  26. Problem with edge cases is they are potentially infinite though, so generating enough "general" edge cases is a tough nut to crack. Humans also display varying levels of cognition regarding edge cases as well, so it's not like we do much better.

    But that recent video of the Tesla FSD slightly freaking out over a flatbed truck carrying inactive traffic lights was certainly amusing. The visualized FSD recognition behavior where a traffic light seems to materialize onto the road from the rear of the truck and approaches the car is rather interesting, in that it appeared to assume traffic lights can't be mobile, and when it specifically recognized one light, it became temporarily "fixed" in space.

    Reply
  27. V2V comms is going to be a minefield. Nevermind the privacy issues, the whole trust angle is a hard problem. But you have to look at the current human model, where if you have another driver signaling information regarding a blind corner, how far do you trust their information? I think that relates to high trust model societies as well, but there's the old phrase "trust, but verify" that you ultimately have to work with.

    Reply
  28. "Vision [is] the only scalable low-cost solution for self-driving."

    I'm not so sure that's true, as the cost of lidar and radar have been dropping remarkably fast.

    But as much as our roads are set up for drivers who rely on vision, vision by itself should be sufficient for human level driving, especially if it's not limited to the visible spectrum. (IR penetrates fog better.)

    What we have to avoid are solutions, like inter-communication between vehicles, which would leave self-driving vehicles vulnerable to spoofing. Despite how tempting they will be in terms of increased performance.

    Reply
  29. Maybe I missed something. Read through most of the 'edge' cases, but my brain fogs over after a while.

    Are they using light detection in the range that humans can detect? What is the wave length range? What is the wave amplitude range? Does night time driving have a seperate set of rules?

    I plan on buying a Tesla in the next 3 years, but a big part of that is FSD that is bulletproof on interstate highways, where I can rest while the car handles everything. Being able to eat and urinate while the car drives itself for hours across large, boring swaths of the US is highly appealing…

    What? To much personal info?…

    Reply
  30. Still mega computing will not solve full autonomy. In fact, their failure will be so colossal that it will be a landmark in understanding the limits of AI.

    Reply

Leave a Comment