Here is an LLM Hardware Calculator.
If you enter the size of the LLM model you want to run locally, then the calculator will provide the GPUs, memory and other primary specifications needed to run the model.



Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.
Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.
A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.
The most useful result from that calculator is this: to have Grok-level performance locally, even accounting for much slower output, the current and expected hardware is absolutely inadequate, and that will not change any time soon. For any hope of that changing, VRAM (GDDRx) should be nothing more than cache, and all model data should be stored in abundant cheap DRAM, such as DDR4, not DDR5 with its signal integrity limitations to scaling. Good luck convincing industry to make 2TB DDR4 “modules” for this use case now.