64-Core AMD Processor Can Double Compute Performance

October 2, 2019 by Alvin

AMD has released the first single-socket, 64 physical core (128 threads) processors. The 2nd Gen AMD EPYC Processors deliver world-record, best-in class performance up to a 2×3 generational performance increase that outpaces Intel Xeon Platinum by up to 87%.

AMD EPYC 7002 Series Processors set a new standard for the modern datacenter. Driven by the AMD Infinity Architecture, the AMD EPYC 7002 Family is the first x86-architecture server processor based on 7nm process technology, a hybrid, multi-die architecture, PCIe® Gen4 I/O, and an embedded security architecture. Together, these innovative capabilities deliver what you need. Performance leadership for your workloads. Protection at every layer to help secure your CPU, applications, and data—whether in your enterprise datacenter or the public cloud.

Links to AMD:

AMD EPYC 7742
AMD EPYC 7702
AMD EPYC 7702p

SOURCES : AMD
Written By Alvin Wang, Nextbigfuture.com

Alvin

Serial entrepreneur and CTO.

16 thoughts on “64-Core AMD Processor Can Double Compute Performance”

Zardinuk

October 5, 2019 at 7:11 pm

The “if you’re smart enough” sorta demonstrates my point about academics and their artificial barriers. I don’t think it’s necessary that we have a language that doubles as an IQ test.
Zardinuk

October 5, 2019 at 7:02 pm

C++ suffers from this problem that academics tend to introduce, where things never really make it very far outside their thought bubble. Sure, C++ has been around a long time and continues to have a place in modern software, but I think it’s only historical. There are technically superior languages, like Go.

What makes a language really special isn’t necessarily it’s technical sophistication, it’s the simplicity of it and the ability of a typical human to express themselves in it. The functional languages are superior because they mask the parallel architecture behind a much simpler framework, instead of thinking in terms of a “cpu thread”, more a physical constraint than anything, it provides a framework to break things down into a set of stepwise parallel dependencies and data locality, and leave it to the a scheduler to automatically arrange the pieces, not the author.

I think we’ll continue to make progress toward a simple, expressive language that anybody could use, not just “computer scientists”. I think it ought to be more easily spoken. I think Siri, Google and Alexa will eventually nudge us toward a more syntax-oriented spoken language, with a more concrete logical constructs, and a step backward to more stepwise procedural code (like Basic). And it will be far superior to the random collection of unnecessary symbols that make up C++. If you can’t say what you mean when you read C++ aloud then it’s not a very practical language.
Asteroza

October 4, 2019 at 3:58 am

AMD is going chiplet though (and it looks like Intel is also moving that way), so expect bigger sockets. AMD’s implementations though mean there aren’t that many PCIe lanes under any given PCIe root complex though, which has been the standing differentiator with AMD as of late, but I guess that’s disappearing as well. I guess those Intel chips that allegedly have 64 lanes per CPU will be the last of the big root complex type CPU’s then. If you want fast inter-PCIe bandwidth with low latency, I guess that means paying the Broadcom/PLX tax it is then.
Combinatorics

October 3, 2019 at 11:01 pm
Java and C# are not a simplifications of C++.
- C++ is a race car for race car drivers who understand and leverage how race cars work.
- Java is a sports car for people who do not understand how sports cars work.
- C# is a car for people who don’t like Java and want to drive from library to library.
As for a parallel programming language on steroids we have them: LISP/Haskell or C++ if you are smart enough to drive the C++ race car with no anti-lock brakes in the rain.
Combinatorics

October 3, 2019 at 10:53 pm

Modern concurrent libraries (C++) in conjunction with C++11 lambda functions are pretty good. You can in parallel execute a lambda function for each item in a STL container which is a common use case for quite a fair bit of parallelism. Your code would look like this (probably with better indenting):

std::array<int,100> data_to_process;
parallel_for_each(begin(data_to_process), end(data_to_process),
[] (int& data) { data *= fn( i, j, k ) ; }

It’ll work with any random access container (C arrays, STL arrays, vectors, unordered maps, deques, etc).

I’ve never used D but I like some of the ideas in it. C++ definitely has some cruft that could be cleaned out and some areas that perhaps could be programmatically simplified (e.g. as of last count Joussitis has found 17 ways to initialize a variable…)
GoatGuy

October 3, 2019 at 9:37 pm

Well technically, a GPU is a different KIND of computing element than a CPU “core”. A CPU core must support execution of the entire IA (instruction architecture). An X86-derivative chip’s cores support all of the grew-like-kudzu instructions that were added every time both Intel and AMD ‘extended the instruction set’. Remember SSE, SSE2, SSE3, MMX and all that? Remains that every conventional X86 CPU core supports it.

By comparison, the ‘shaders’ and “CE’s” of a GPU are WAY simpler than a ‘core’. Like perhaps only 5% as complex (95% less complex). Further, they’re in the business of executing same-code-different-data in parallel. Nominally perfect for graphics, where hundreds-of-millions of pixels need computing per second. It mightn’t be terribly complex to “compute the next pixel” of the (70 Hz × 4,000 × 2,500) = 700,000,000 pixels, but multiply even a simple operation by seven hundred million … and even 64 cores might not be able to keep up that very long.

But it is no problem for a GPU.

Turns out, there are some computing problems that depend on HUGE computations of a small algorithm working on a large dataset, which benefit markedly by GPU co-processing. Hence why their utility is larger than “just graphics” and even the more limited “just gaming graphics” uses.

Just saying,
GoatGuy ✓
GoatGuy

October 3, 2019 at 9:17 pm

In all fairness… I did NOT invent the ‘D’ language that is presently in the wild. Mine was another completely ‘one-man-effort’ thing, and it only got so far as having a crâhppy code generator. It did not depend on using a C compiler either … it was its own beast. However, raising families, being a co-owner in another burgeoning 1990s tech business … I really had little time to continue its development. That along with another 5 projects that had to be shelved.

So… No, i’m not the inventor of the real D language.

BTW, Java is definitely not a simplification of C++, nor was it to be so . It was a joint American-European effort, and as is completely typical, it suffers from being so crafted. C#, yes, that was Microsoft’s ‘object simplification’ especially wrapped around the Windows O/S’s visibility interface for Windows-based GUI applications. Not exclusively, but it sure made things easier.

I agree with the rest of your comment.
GoatGuy ✓
knl

October 3, 2019 at 7:14 pm

Intel had single-socket CPUs with 64 to 72 cores 3 years ago. https://ark.intel.com/content/www/us/en/ark/products/codename/48999/knights-landing.html
Anonymous

October 3, 2019 at 5:35 pm

wow… goat guy is the inventer of D programming language??? I remember reading about that on Wikipedia many times..wondering if this guy was half nuts to take on C++…. not that it hasn’t been done at least 20 times already…. most notably java and C# motivated as simplications of C++ to make the object model less annoying…. (Personally, I had always wished that Delphi has won…) I think if they really want to make a parallel programming langage on steriods, they should just take systemverilog and change all the syntax that doesn’t look like c++ code and make it following c++ code brace convention instead of “begin” “end”… then you have a real parallel programming language… all it really needs is a standard template library like C++…. Xilinx even has a compiler to convert the code you write in Verilog into GPU code…
Anonymous

October 3, 2019 at 5:26 pm

Who needs a GPU when you have 64 CPUs??? I remember this announcement last year… big cover up by Intel…they told everyone they had a 64 CPU XEON as well… then it was uncovered that they were lieing…it was only 48 CPUs and AMD had out done them….
GoatGuy

October 3, 2019 at 3:00 pm

Yay! C++!

28 years back, I began writing a language specifically suited to parallel processing, yet still syntactically ‘obvious’ enough to be teachable to most programmers. I called it D. In 1991, we definitely had C++ of course. And I liked the language a lot. It seemed though to have taken on that kind of formal-academic cruft that also made Java a pig, and Pascal a beautiful, but wordy mess.

The most important invention in D was the ;& operator. It was a programmer’s signal to the compiler to ‘spawn’ the statement thus ended and NOT wait for its result. Thus…

for(i = 0; i < 100; i++)
… array[i] *= fn( i, j, k ) ;&

Became intrinsically parallel, without regard to exactly how many threads one’s CPU-and-operating-system were willing to met out. Or whether the GPU was involved … or not. The only limit was the iterator {for(….)} part. Obvious? Perhaps.

The sync/complete part of parallelism was handled in the code generator/optimizer. (A whole lot of ‘automagic’ parallelism didn’t require the ;& operator… technically not even the above did!) An efficient semiphore instantiater kept track of execution dependencies, allowing threads to gracefully block awaiting still-flying-along results to finish.

Anyway fun programmer stuff.
GoatGuy ✓
Combinatorics

October 3, 2019 at 7:02 am

“Its a good time to be a computer scientist with a parallel-processing speciality.”

Most the kids coming out of college are focusing on AI and training neural nets to sit and beg.

As someone who enjoys parallel computation whenever I get to attempt it (using God’s language: C++) I can say that the languages are behind where the hardware is at. Sure the functional languages (LISP, Haskell) can scale better on these machines but they are not widespread and their syntax ahem.. sucks.

We need a functional subset of C/C++. That can run on both CPUs and GPUs.
Combinatorics

October 3, 2019 at 6:54 am

If Intel buys Nvidia then they would be a monopoly through and through and they know this.
Mindbreaker

October 3, 2019 at 1:08 am

While one piece of silicon could, it is just not optimal for pricing. You have to throw out too much silicon when things don’t go right. Check out this beast: https://www.nextbigfuture.com/2019/08/1-2-trillion-transistors-on-a-wafer-scale-ai-chip.html
Carinae

October 2, 2019 at 11:13 pm

Agreed. I’m thinking AMD’s acquisition of ATI certainly brought in some expertise that will see the core count you spoke of. Intel’s inability/unwillingness to acquire Nvidia may cost it in the short term. The increase in core count on a single machine will open up many possibilities, but in my opinion the greatest short term effect will be infrastructure consolidation and subsequent efficiencies found through virtualisation on said machines. I’m certainly giddy about the next 5 years.
GoatGuy

October 2, 2019 at 9:40 pm

I’m amazed with “this round” of mega-chips from AMD and Intel. 64 cores simultaneously executing code at nearly unimpeded speed. Amazing.

Kind of leads to a “futurist commentary”, right?

E.g. Will 7 nanometer design rules with EUV processing lead quickly to 5, then 3 then 2.5 and so on?

Moore’s Law is definitely being challenged by optical and atomic physics. Yet, while no single die today can house all 64 cores, 250 megabytes of cache, the interprocess traffic, even so, it is clear that multiple chiplets can. Will this be extended to semi–3D topologies (stacked chiplets) too?

TSMC sez it will start production at the 5 nm node “before 2021”. We can expect rapid advance to 3 or 3.5 nm from there, since EUV tech doesn’t see these dimensions as much harder. Below about 2.5 nm, the statistical distribution of dopant atoms in the silicon crystal matrix (critical for logic gate operation) becomes difficult to bracket for trillions of transistors.

But if the ‘partial failure still makes a good chip’ theology continues, then it shouldn’t matter. Soon, probably very soon, 256 core and higher counts per package. I’ve long felt that 512 to 1024 core all-on-one-carrier chips are the sweetest spot. Who knows… we might be seeing them in less than 5 years.

Its a good time to be a computer scientist with a parallel-processing speciality.

Just saying,
GoatGuy ✓

Comments are closed.