Tesla FSD With Same Testers Show FSD 12.5.1.3 is 4X Safer Than V 12.3.6

The TeslaFSDTracker.com (community crowdsourced data) had 260+ testers sign up and report data.

@raines1220 has compared FSD testers who have used all versions of FSD and found large improvement in the versions.

v12.5.1.3 is ~14x better overall and ~4x safer than v12.3.6

Most testers have not been consistent. There’s only only a handful of people who consistently report data for each version and even less of that handful have 12.5. All of this can be viewed on the tester view tab using filters so it’s not hidden. I don’t think we’re at a point in 12.5 where we can draw any real conclusions. This is such a weird situation with a large majority of people not having the latest version. Rough estimate is probably 70-80% of testers within the last 6 months don’t have 12.5. That’s a huge chunk of data that is missing from what is represented on the tracker or the analysis from @raines1220

If we take the same testers in v12.3.3 and v12.3.6 (i.e., testers 1, 196, 166, 135, 7, 169, 106, 236, 109, 141, 79, 58, 190), we’ll see that v12.3.6 is actually 1.86x better than v12.3.3.

What is the relationship between miles-to-intervention and miles-to-accidents. According to the National Safety Council, motor vehicle collisions between two cars account for 71% of all incidents. If we consider that all accidents are basically two cars having a collision, then the probability of each accident can be simplified as the product of each driver making critical mistakes: p(accident) ≈ p(driver critical mistake)^2. I would argue that “miles to critical intervention” is more similar to “miles to critical mistakes” rather than “miles to accidents”. According to the NHTSA, we have 670K miles per accident in the US recently, which means the probability of an accident per mile is 1/670K. This implies that the probability of critical mistakes per mile is ~1/819. That means a human driver makes critical mistakes every 819 miles, which suggests that FSD is very close to human driver safety now.

Thinking about “critical interventions” during human supervised FSD and comparisons to actual human driver error rates

human drivers crash every ~half million miles (dependent on strictness of the crash definition)

but human drivers run a red light every 3,000 miles

if you were supervising FSD and it looked like it were going to blow through a red, clearly you would intervene (and label that intervention critical).

But if it doesn’t run a red light at least once every three months or so then it is actually performing better than the average human. That it was about to miss a red light does not mean that it was about to get into an accident.

This sort of discrepancy could go part of the way to explaining the difference between the intervention rate on the FSD community tracker (1 every 300 on 12.5) and Elon’s commentary on measured critical error rates (one accident per year on 12.4)

4 thoughts on “Tesla FSD With Same Testers Show FSD 12.5.1.3 is 4X Safer Than V 12.3.6”

  1. Interesting article. Not sure how realistic it is vs how much is motivated reasoning/’cope’, but adds something to see evolution for specific testers.

    It’s hard to view the general all-submissions trajectory of the results as particularly positive, when prima facie 12.5 had 2x the distance to disengagements than latest 12.5.1.3. Forward progress seems uncertain/elusive which is a pretty big concern in and of itself; the process/development paradigm is not reliably producing improvement.

    Still seems to be at least 2-3 orders of magnitude away from being able to satisfy regulatory bodies that FSD is good enough for un-monitored robo taxis to work in the wild.

  2. I noticed something interesting on TeslaFSDTracker website. August and July 2023 had less drives with critical disengagement than July and August 2024!!! So it is gotten worse in some aspects. The improvements are so tiny, that basically it is a waste of time. There were promises for 12.5 to have 10-20k miles without disengagement. Data fail to confirm that.

    • It may also be that the type of driver buying Teslas has broadened to include more people who intervene/don’t trust FSD. The data is messy in so many ways, it’s hard to make any meaningful comparisons, but for insurance purposes, only accidents per mile and accident severity will matter in the end. Tesla lowering premiums on its own insurance represents a conflict of interest big enough to drive a cybertruck through, so I wouldn’t count that.

  3. The cars with FSD 12.5 are HW4 only currently which has much more compute power.

    Please do a comparative analysis of HW4 to HW3 when FSD is 12.5 is available.

Comments are closed.