Using the AIs will be way more valuable than AI training.
AI training – feed large amounts of data into a learning algorithm to produce a model that can make predictions. AI Training is how we make the AI that is useful.
AI inference is where we do useful and valuable things with the trained AI.
Nvidia revealed the H200 chip running the latest Open source Llama 3 model can make seven times more revenue than the chip and operationg the chip costs over four years.
This means be able to build, deploy and operate the most AI inference will mean getting the most AI revenue.
Here I go over the details of how Tesla’s plan for a distributed AI inference system will let them deploy 10 to 100 times more AI capacity than other competitors.







Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.
Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.
A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.
Bandwidth and latency problems will limit what one can achieve with the hardware sitting in vehicles. Sending video and bitmaps to a million cars for processing and expecting immediate response is not realistic yet.
This is still a fuzzy area because there still are continued progress and developments.
Like matmul less models, requiring a lot less electrical and computing power and that could be made in ASICs.
The current kings of the hill might be dinosaurs in no time.
Yep. 2 bit (or literally 1.58 bit) neural network models have the potential to be MUCH more efficient when running on inference engines designed for that purpose. Theres a good chance H200s will soon be obsolete for this.