Cerebras is a startup that makes wafer sized AI chips. They are making a data center with those AI wafer chips to provide super-fast AI inference.
‣ Llama3.1-70B at 450 tokens/s – 20x faster than GPUs
‣ 60c per M tokens – a fifth the price of hyperscalers
‣ Full 16-bit precision for full model accuracy
‣ Generous rate limits for dev
The Nvidia multi-tasks its AI inference chips to support more people for AI inference. A cluster of Nvidia H200s is designed to give AI answers to thousands of people at the same time. The 60-90 tokens per second is faster than most people can read. However, we can get output from computer software at speeds faster than we can read. It is assumed that we could scan a result from a google search to get the information that we want. This means it is valuable to get AI inference results at higher token per second (speed).
One could imagine a future with super fast AI inference where this speed was used to always provide a quick useful summary of how the answer could be provided and to quickly enable elaboration and details where desired based upon fast human interaction.

Introducing Cerebras Inference
‣ Llama3.1-70B at 450 tokens/s – 20x faster than GPUs
‣ 60c per M tokens – a fifth the price of hyperscalers
‣ Full 16-bit precision for full model accuracy
‣ Generous rate limits for devs
Try now: https://t.co/50vsHCl8LM pic.twitter.com/hD2TBmzAkw— Cerebras (@CerebrasSystems) August 27, 2024



Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.
Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.
A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.