Could Tesla Mass Produce Dojo AI Cloud Systems

August 22, 2021 by Brian Wang

Tesla revealed its Exapod Dojo supercomputer made from 10 cabinets of servers. Each rack has 6 tiles with 9 Petaflops in each tile. Each cabinet is 100 petaflops.

The fastest AI supercomputer is Nvidia Selene with 2.8 Exaflops of performance.

In 2020, Nvidia Selene was 1 Exaflop and cost about $56 million. The Nvidia Selene was 280 DGX A100 AI compute cabinets. AI supercomputers are cheaper then general purpose supercomputers like the $1 billion Fugaku (400-500 Petaflop) supercomputer.

Tesla indicates that the Dojo Exapod has 5X smaller footprint and 4X the performance. Dojo 2 will have 10X the performance of Dojo 1.

I think Tesla was able to spend less than the cost of the Nvidia Selene. If Tesla could spend half the cost of the 2020 Nvidia Selene then this would be about $28 million. 110 tiles and networking and other gear for each tile would mean that each tile currently costs about $280,000.

If Tesla started mass production of the tiles to make Dojo data centers and to sell AI cloud services, then they could drive the cost per tile down to $10-30k.

This would provide more revenue for Tesla which could fund more AI hardware and software development and it would support a larger Tesla AI team.

Tesla could have 100 Tesla Dojo AI Clouds and 10 dedicated Tesla AI Dojo’s. There would be five Dojos for FSD and five for Tesla bot.

I think Tesla will mass produce the Dojo Exapods and create Tesla AI cloud service. This would be $10B+/year in revenue.

Mass producing one hundred Dojo’s could bring the cost down to $10 million each. One hundred would be $1 billion.

Tesla will need multiple or larger Dojo’s for Tesla FSD training and for Tesla bot training.

SOURCES – Forbes, PCGamer, Tesla
Written by Brian Wang, Nextbigfuture.com

Brian Wang

Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.

Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.

A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.

17 thoughts on “Could Tesla Mass Produce Dojo AI Cloud Systems”

Brian Wang

August 31, 2021 at 7:05 am

I said in February 2021 that a stalemate is the best the US could do in the Middle East. I was right. Biden f..d up and pulled everything out and triggered the collapse. Just like the pullout in Iraq overseen by Biden. Pulled out 150k soldiers from 2009-2011. Resulting in rise of ISIS. iSIS required US going back. ISIS destroyed in prior administration. Iran was contained under Trump. Biden will f that up too. Venezuela is down to 529000 barrels per day of oil production. 5.2 million people fled Venezuela. One sixth the economy. Venezuela has joined Cuba and n Korea as broke and broken countries.
Asteroza

August 26, 2021 at 4:21 am

Considering Tesla allegedly is/was the owner of one of the most poorly maintained and skankiest VMware clusters in the US (and a SPoF for Tesla car services online to boot), I would expect them to at least work a little on that first.

Easiest would be a AWS partnership to colocate with them. Leave the datacenter stuff to the pros, especially since DOJO looks might be a DC centric design, rather than AC. DC power datacenters are becoming a thing with hyperscaler sized companies.
Anonymous

August 24, 2021 at 11:21 pm

Just keep in mind, current ML architectures are insufficient to deliver lvl5 and no amount of compute changes that. All the AI compute anyone could possibly want has been readily available, it's not like no one in the world has been able to solve lvl5 due to a lack of compute. Lvl5 is still a scientific problem, I hope they're also investing in research now that they've decided to stop focusing on the minor leagues that is Autopilot.
Anonymous

August 24, 2021 at 2:19 pm

Ok, good to know (or not because it would be nice if you got more revenue) and valid points, i agree on the fact that Elon has made some astonishing achievements, I disagree however on the perception of how far ahead some of his companies are and how politics on how the market will respond to the rapid shift, on the dates and even if some claims will even come true at all. Too much optimism, really, but it's ok, only time will tell.
If I may I will drop an idea to a possible article or even a series of articles. You have been doing this for some years now, it would be interesting to see which predictions turned out to be true and which didn't, maybe an article for each topic (nuclear energy, computation, ai, military…).
I don't see many websites doing this and I think it would be interesting and revealing.
Regardless, keep up the good work mate.
Anonymous

August 24, 2021 at 8:45 am

Another interesting comparison. The "mesh" bandwidth of CS-2 is 27 PB/s. The exapod does not have the exact equivalent, but if you count all the I/O going inside of a RD2-chip and sum over the whole system, you get 2500 PB/s in each direction. But of course, between RD2-chips you "only" have 9 TB, so this is not an apples-to-apples comparison.

Within the RD2 chip you have 442 MB of SRAM, so each "chunk" of about half a GB of memory is stitched together with an exceptional bandwidth per unit of SRAM memory.
Anonymous

August 24, 2021 at 8:38 am

Hm.. I've thought about it and perhaps DOJO could be a step up even from CS-2. The grand total of SRAM memory to CS-2 is 40 GB, which, let's be honest, is not a whole lot. But the SRAM bandwidth is great, a whopping 2,5 PB/s. Now an Exapod (Tesla) has 660 GB of SRAM, an order of magnitude greater than the CS-2 system. Does this mean that the SRAM bandwidth is also 10-times greater? Probably. I would say that Cerebras wins in terms of SRAM bandwidth per watt for their system (23 kW compared to 900 kW), by a factor of 3 (23 kW / 2.5PB; 900 kW / (2.5PB*15)).

Of course, this is not really a fair comparison. Tesla uses 10 racks and Cerebras only 1 rack.. But Tesla can connect their racks in an indefinite chain whos "-side-link" is 27 TB. Compare that to ~0.15 TB for a Cerebras rack. Tesla wins by 2 orders of magnitude. And Cerebras is using vanilla ethernet to connect their racks; probably not conducive to low latencies whereas Tesla uses their own proprietary protocol (low latency).

I.e. it would seem that Teslas system is more scalable. Of course, we do not have a clue of the amount of DRAM in the Tesla system, nor the bandwidth. Is there more DRAM than SRAM in an Tesla exapod? What is the bus bandwidth? And, what is the bandwidth to/from permanent storage? What size is the permanent storage?
Anonymous

August 23, 2021 at 11:07 pm

https://www.amazon.com/Problems-Associated-Artificial-Intelligence-Book-ebook/dp/B099NBH8MV/ref=sr_1_1?dchild=1&qid=1629759990&refinements=p_27%3AHuseyin+Gurkan+Abali&s=books&sr=1-1-catcorr
Anonymous

August 23, 2021 at 6:57 pm

Indeed; it's ten 1500 W electric showers. You can definitely heat some water and make an thermal waters spa next door or something. ☺
Anonymous

August 23, 2021 at 5:33 pm

Yes, at the moment at least, if we are looking for a real tech breakthrough for better computing with AI cloud services, the company to watch is Cerebras. Their large chip could make them the top provider for the entire industry. So they could become the Intel of AI cloud services.

What I am seeing from Tesla is a nice engineering effort. But AWS, Google, Microsoft, Nvidia and IBM have nice engineering efforts too. So Tesla can jump into that arms race if they want to. Those companies together are likely to spend over a hundred billion dollars in R&D on their neural network cloud services in the coming decade.

I do understand why Tesla may want its own data centers top to bottom for its own needs though. Tesla has achieved the goal of becoming a major tech company, so they can do more stuff like this in-house. But selling it to outside customers is another story. I am pretty sure all of those companies above are not going anywhere on that front.
Brian Wang

August 23, 2021 at 5:01 pm

I am not shilling on Musk. I do not get paid for Musk articles. There is very little advertising revenue for this site. When I get paid articles comments are shutoff for those. I have invested in Tesla stock and a tiny amount in SpaceX stock. Nothing I say or do moves the valuation of those stocks. I try to tell my readers those are good and valuable companies that will become far more valuable. I get no benefit from this, I am hoping readers would listen and benefit in the coming years.

People may not like it but Elon Musk is changing the world and making disruptive future technology happen.

Starlink will generate more annual revenue than NASA budget before 2025.
Tesla ramping current factories (Fremont, Shanghai, Berlin, Austin) by 2023 should have 4-5 million cars and $150-250 billion in revenue. Tesla-elon will have $30-60B/year to do further world changing things.

More factories will be started in 2022. Matching CATL in batteries would be another $200-400B in valuation. Adding cloud could be $20-100 B in more revenue.
Brian Wang

August 23, 2021 at 4:49 pm

Ok, I covered the laser fusion progress. 8X more power from one shot. 1000X improvement needed to get to actual true breakeven. The current NIF nuclear weapons research facility will cap out with another 12X improvement.
https://www.nextbigfuture.com/2021/08/what-the-recent-progress-in-laser-fusion-means.html
This laser fusion path is over 100 years from meaningful nuclear fusion for any useful purpose. It is weapons related research. Rapid laser pulsing is entirely different path.
William Readling

August 23, 2021 at 3:37 pm

15 kW is 51000 BTU/hr, you could heat a house with one of these, assuming good insulation, and tight envelope.
Anonymous

August 23, 2021 at 3:27 pm

What I would like to know is how great is the SRAM bandwidth to the training nodes.. Does it compare favorably to CEREBRAS?

Also note that DOJO is only 30% better per W, compared to GPUs.. And by the time DOJO is built (next year), then GPU's will be better. So DOJO2.0 is the thing to look out for..

But maybe Teslas Dojo computer has a greater "loading factor", i.e. that is actually reaches the theoretical performance in real training as opposed to GPU-based systems. James Douma is of the opinion that the FSD computer (HW3) is far, far better utilized compared to GPU-based solutions, so perhaps the same is true here as welll…
Anonymous

August 23, 2021 at 3:23 pm

First off, 10B in anual revenue is pocket change for Tesla for the future Tesla, so I don't really see the upside of offering AI training as a service. Using all the compute in getting the humanoid robot working would generate trillions in stock value.

Also, even when Tesla does not need it's own AI-compute for inhouse projects, it would make more sense to make the complete SW for customers and then rent it to them. Tesla contributes the SW know how (architecture, real world AI knowledge) and possibly HW and gets a recurring revenue in exchange. This business model could generate some serious money…
Anonymous

August 23, 2021 at 2:48 pm

It makes sense that Tesla would invest in growing the Dojo infrastructure for the same reason it made sense for Amazon to grow AWS infrastructure. It’s central to their business, it reduces basic costs for their business at the same time it brings in new revenue to support more growth and lower costs. If they’re gonna build Dojo’s for themselves why not get good at it building as many as possible for anyone to use as a cloud service. Tesla has substantial lead in this subset of cloud computing now and the Capital to do this.
Anonymous

August 23, 2021 at 2:03 pm

I too think that Brian tends to shill on muskets projects. But it's understandable that after the tesla ai conference there is a greater influx of articles about what was discussed. He is also producing content that targets an audience and generates clicks, it happens that most people here are muskets fanboys.
Anonymous

August 23, 2021 at 12:48 pm

Oh, you mean that fusion is generating net energy? No? Wake me up when it does.

Comments are closed.