China DeepSeek AI Is Over Ten Times More Efficient in AI Training

China DeepSeek AI matches the performance of AI while using only 9% of the AI training compute. The continued improvement in AI while using resources more efficiently shows we will reach AGI at affordable costs for training and inference. AI will reach higher levels of intelligence and energy and computation will get more efficient.

DeepSeek (Chinese AI co) is making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M).

For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being brought up today are more around 100K GPUs. E.g. Llama 3 405B used 30.8M GPU-hours, while DeepSeek-V3 looks to be a stronger model at only 2.8M GPU-hours (~11X less compute). If the model also passes vibe checks (e.g. LLM arena rankings are ongoing, my few quick tests went well so far) it will be a highly impressive display of research and engineering under resource constraints.

1 thought on “China DeepSeek AI Is Over Ten Times More Efficient in AI Training”

  1. [ “How much energy is required for an inference request towards a server located LLM model (small GPT2 ~1.5billion parameters, large GPT3 ~175billion parameters) for shares on network structure, GPU/TPU hardware, including local 10-400W client side prompt processing (~150 characters) compared to a search engine research (single request, e.g. Google)?”

    “Conclusion

    For LLM inference, energy consumption is typically higher per request (especially for larger models) because of the significant computational resources required by the server (GPUs/TPUs) to process the prompt. A single inference request (depending on the model size) could consume 10-100 joules, with larger models requiring more energy.

    For a search engine query, the energy consumption is generally lower (around 10-20 joules) per request, because search engines typically focus on index lookup, ranking, and returning results, which generally involves less computational complexity compared to generating text with an LLM.”

    “Network and Data Centers: Google has highly optimized data centers, and the energy required to send data across the internet for a search query is relatively low, similar to the LLM request (about 0.01-0.1 joules).”

    Could someone please verify these estimations with being another perspective?
    (Sending a request could be low power requirements (phones) ~1-5W, but local LLM models processing ~50-400W (accelerators available, parameter sizes) for low side ‘consumer’ requirements(?) (thx) ]

Comments are closed.