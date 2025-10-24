Ashok Elluswamy, Tesla’s VP of Autopilot Software, delivered a keynote titled “Building Foundational Models for Robotics at Tesla” at the International Conference on Computer Vision (ICCV).

Tesla FSD camera inputs feed into a large-scale neural network (with expanding parameter counts, soon to scale 10x via new hardware).

Future Outlook

Scale robotaxi to unsupervised, nationwide service.

Cybercab: Two-seat, low-cost autonomous vehicle for robotaxis (beats public transit economics).

Extend to robotics: Same tech transfers to Optimus humanoid (e.g., action-conditioned video generation for navigation).

Tesla’s approach scales across vehicles, locations, weather; emphasizes safety, comfort, speed.

• Tesla has access to a “Niagara Falls of data” — hundreds of years’ worth of collective fleet driving.

• Uses smart data triggers to capture rare corner cases (e.g., complex intersections, unpredictable behavior).

Quality and Efficiency:

• Extracts only the essential data needed to train models efficiently.

Debugging and Interpretability:

• Even though the system is end-to-end, Tesla can still prompt the model to output interpretable data:

3D occupancy, road boundaries, objects, signs, traffic lights, etc.

• Natural language querying: ask the model why it made a certain decision.

• These auxiliary predictions don’t drive the car but help engineers debug and ensure safety.

Tesla’s Advanced Gaussian Splatting (3D Scene Modeling):

• Tesla developed a custom, ultra-fast Gaussian splatting system to reconstruct 3D scenes from limited camera views.

• Produces crisp, accurate 3D renderings even from few camera angles — far better than standard NeRF/splatting approaches.

• Enables rapid visual debugging of the driving environment in 3D.

Evaluation & World Models:

• Evaluation is the hardest challenge: models may perform well offline but fail in real-world conditions.

• Tesla builds balanced, diverse evaluation datasets focusing on edge cases — not just easy highway driving.

Introduced a learned world simulator (neural network-generated video engine):

• Can simulate 8 Tesla camera feeds simultaneously — fully synthetic.

• Used for testing, training, and reinforcement learning.

• Allows adversarial event injection (e.g., adding a pedestrian or vehicle cutting in).

• Enables replaying past failures to verify new model improvements.

• Can run in near real-time, letting testers “drive” inside a simulated world.

What’s Next:

• Scale robotaxi service globally.

• Unlock full autonomy across the entire Tesla fleet.

• Cybercab: next-gen 2-seat vehicle designed specifically for robotaxi use, targeting lowest transportation cost (cheaper than public transit).

• Same neural networks will power Optimus humanoid robot.

• The same video generation system is now being applied to Optimus.

• The system can simulate and plan movement for robots, adapting easily to new forms.

via the International Conference on Computer Vision (ICCV).