China DeepSeek AI matches the performance of AI while using only 9% of the AI training compute. The continued improvement in AI while using resources more efficiently shows we will reach AGI at affordable costs for training and inference. AI will reach higher levels of intelligence and energy and computation will get more efficient.
DeepSeek (Chinese AI co) is making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M).
For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being brought up today are more around 100K GPUs. E.g. Llama 3 405B used 30.8M GPU-hours, while DeepSeek-V3 looks to be a stronger model at only 2.8M GPU-hours (~11X less compute). If the model also passes vibe checks (e.g. LLM arena rankings are ongoing, my few quick tests went well so far) it will be a highly impressive display of research and engineering under resource constraints.

DeepSeek (Chinese AI co) making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M).
For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being… https://t.co/EW7q2pQ94B
— Andrej Karpathy (@karpathy) December 26, 2024


Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.
Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.
A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.
[ “How much energy is required for an inference request towards a server located LLM model (small GPT2 ~1.5billion parameters, large GPT3 ~175billion parameters) for shares on network structure, GPU/TPU hardware, including local 10-400W client side prompt processing (~150 characters) compared to a search engine research (single request, e.g. Google)?”
“Conclusion
For LLM inference, energy consumption is typically higher per request (especially for larger models) because of the significant computational resources required by the server (GPUs/TPUs) to process the prompt. A single inference request (depending on the model size) could consume 10-100 joules, with larger models requiring more energy.
For a search engine query, the energy consumption is generally lower (around 10-20 joules) per request, because search engines typically focus on index lookup, ranking, and returning results, which generally involves less computational complexity compared to generating text with an LLM.”
“Network and Data Centers: Google has highly optimized data centers, and the energy required to send data across the internet for a search query is relatively low, similar to the LLM request (about 0.01-0.1 joules).”
Could someone please verify these estimations with being another perspective?
(Sending a request could be low power requirements (phones) ~1-5W, but local LLM models processing ~50-400W (accelerators available, parameter sizes) for low side ‘consumer’ requirements(?) (thx) ]