Distributed AI Inference Will Capture Most of the LLM Value

Using the AIs will be way more valuable than AI training.

AI training – feed large amounts of data into a learning algorithm to produce a model that can make predictions. AI Training is how we make the AI that is useful.

AI inference is where we do useful and valuable things with the trained AI.

Nvidia revealed the H200 chip running the latest Open source Llama 3 model can make seven times more revenue than the chip and operationg the chip costs over four years.

This means be able to build, deploy and operate the most AI inference will mean getting the most AI revenue.

Here I go over the details of how Tesla’s plan for a distributed AI inference system will let them deploy 10 to 100 times more AI capacity than other competitors.

3 thoughts on “Distributed AI Inference Will Capture Most of the LLM Value”

  1. Bandwidth and latency problems will limit what one can achieve with the hardware sitting in vehicles. Sending video and bitmaps to a million cars for processing and expecting immediate response is not realistic yet.

  2. This is still a fuzzy area because there still are continued progress and developments.

    Like matmul less models, requiring a lot less electrical and computing power and that could be made in ASICs.

    The current kings of the hill might be dinosaurs in no time.

    • Yep. 2 bit (or literally 1.58 bit) neural network models have the potential to be MUCH more efficient when running on inference engines designed for that purpose. Theres a good chance H200s will soon be obsolete for this.

Comments are closed.