64-Core AMD Processor Can Double Compute Performance

AMD has released the first single-socket, 64 physical core (128 threads) processors. The 2nd Gen AMD EPYC Processors deliver world-record, best-in class performance up to a 2×3 generational performance increase that outpaces Intel Xeon Platinum by up to 87%.

AMD EPYC 7002 Series Processors set a new standard for the modern datacenter. Driven by the AMD Infinity Architecture, the AMD EPYC 7002 Family is the first x86-architecture server processor based on 7nm process technology, a hybrid, multi-die architecture, PCIe® Gen4 I/O, and an embedded security architecture. Together, these innovative capabilities deliver what you need. Performance leadership for your workloads. Protection at every layer to help secure your CPU, applications, and data—whether in your enterprise datacenter or the public cloud.

Links to AMD:

AMD EPYC 7742
AMD EPYC 7702
AMD EPYC 7702p

SOURCES : AMD
Written By Alvin Wang, Nextbigfuture.com

16 thoughts on “64-Core AMD Processor Can Double Compute Performance”

  1. The “if you’re smart enough” sorta demonstrates my point about academics and their artificial barriers. I don’t think it’s necessary that we have a language that doubles as an IQ test.

    Reply
  2. C++ suffers from this problem that academics tend to introduce, where things never really make it very far outside their thought bubble. Sure, C++ has been around a long time and continues to have a place in modern software, but I think it’s only historical. There are technically superior languages, like Go.

    What makes a language really special isn’t necessarily it’s technical sophistication, it’s the simplicity of it and the ability of a typical human to express themselves in it. The functional languages are superior because they mask the parallel architecture behind a much simpler framework, instead of thinking in terms of a “cpu thread”, more a physical constraint than anything, it provides a framework to break things down into a set of stepwise parallel dependencies and data locality, and leave it to the a scheduler to automatically arrange the pieces, not the author.

    I think we’ll continue to make progress toward a simple, expressive language that anybody could use, not just “computer scientists”. I think it ought to be more easily spoken. I think Siri, Google and Alexa will eventually nudge us toward a more syntax-oriented spoken language, with a more concrete logical constructs, and a step backward to more stepwise procedural code (like Basic). And it will be far superior to the random collection of unnecessary symbols that make up C++. If you can’t say what you mean when you read C++ aloud then it’s not a very practical language.

    Reply
  3. AMD is going chiplet though (and it looks like Intel is also moving that way), so expect bigger sockets. AMD’s implementations though mean there aren’t that many PCIe lanes under any given PCIe root complex though, which has been the standing differentiator with AMD as of late, but I guess that’s disappearing as well. I guess those Intel chips that allegedly have 64 lanes per CPU will be the last of the big root complex type CPU’s then. If you want fast inter-PCIe bandwidth with low latency, I guess that means paying the Broadcom/PLX tax it is then.

    Reply
  4. Java and C# are not a simplifications of C++.

    • C++ is a race car for race car drivers who understand and leverage how race cars work.
    • Java is a sports car for people who do not understand how sports cars work.
    • C# is a car for people who don’t like Java and want to drive from library to library.

    As for a parallel programming language on steroids we have them: LISP/Haskell or C++ if you are smart enough to drive the C++ race car with no anti-lock brakes in the rain.

    Reply
  5. Modern concurrent libraries (C++) in conjunction with C++11 lambda functions are pretty good. You can in parallel execute a lambda function for each item in a STL container which is a common use case for quite a fair bit of parallelism. Your code would look like this (probably with better indenting):

    std::array<int,100> data_to_process;
    parallel_for_each(begin(data_to_process), end(data_to_process),
      [] (int& data) { data *= fn( i, j, k ) ; }

    It’ll work with any random access container (C arrays, STL arrays, vectors, unordered maps, deques, etc).

    I’ve never used D but I like some of the ideas in it. C++ definitely has some cruft that could be cleaned out and some areas that perhaps could be programmatically simplified (e.g. as of last count Joussitis has found 17 ways to initialize a variable…)

    Reply
  6. Well technically, a GPU is a different KIND of computing element than a CPU “core”. A CPU core must support execution of the entire IA (instruction architecture). An X86-derivative chip’s cores support all of the grew-like-kudzu instructions that were added every time both Intel and AMD ‘extended the instruction set’. Remember SSE, SSE2, SSE3, MMX and all that? Remains that every conventional X86 CPU core supports it.

    By comparison, the ‘shaders’ and “CE’s” of a GPU are WAY simpler than a ‘core’. Like perhaps only 5% as complex (95% less complex). Further, they’re in the business of executing same-code-different-data in parallel. Nominally perfect for graphics, where hundreds-of-millions of pixels need computing per second. It mightn’t be terribly complex to “compute the next pixel” of the (70 Hz × 4,000 × 2,500) = 700,000,000 pixels, but multiply even a simple operation by seven hundred million … and even 64 cores might not be able to keep up that very long.  

    But it is no problem for a GPU.

    Turns out, there are some computing problems that depend on HUGE computations of a small algorithm working on a large dataset, which benefit markedly by GPU co-processing. Hence why their utility is larger than “just graphics” and even the more limited “just gaming graphics” uses. 

    Just saying,
    GoatGuy ✓

    Reply
  7. In all fairness… I did NOT invent the ‘D’ language that is presently in the wild. Mine was another completely ‘one-man-effort’ thing, and it only got so far as having a crâhppy code generator. It did not depend on using a C compiler either … it was its own beast. However, raising families, being a co-owner in another burgeoning 1990s tech business … I really had little time to continue its development. That along with another 5 projects that had to be shelved. 

    So… No, i’m not the inventor of the real D language. 

    BTW, Java is definitely not a simplification of C++, nor was it to be so . It was a joint American-European effort, and as is completely typical, it suffers from being so crafted. C#, yes, that was Microsoft’s ‘object simplification’ especially wrapped around the Windows O/S’s visibility interface for Windows-based GUI applications. Not exclusively, but it sure made things easier. 

    I agree with the rest of your comment. 
    GoatGuy ✓

    Reply
  8. wow… goat guy is the inventer of D programming language??? I remember reading about that on Wikipedia many times..wondering if this guy was half nuts to take on C++…. not that it hasn’t been done at least 20 times already…. most notably java and C# motivated as simplications of C++ to make the object model less annoying…. (Personally, I had always wished that Delphi has won…) I think if they really want to make a parallel programming langage on steriods, they should just take systemverilog and change all the syntax that doesn’t look like c++ code and make it following c++ code brace convention instead of “begin” “end”… then you have a real parallel programming language… all it really needs is a standard template library like C++…. Xilinx even has a compiler to convert the code you write in Verilog into GPU code…

    Reply
  9. Who needs a GPU when you have 64 CPUs??? I remember this announcement last year… big cover up by Intel…they told everyone they had a 64 CPU XEON as well… then it was uncovered that they were lieing…it was only 48 CPUs and AMD had out done them….

    Reply
  10. Yay! C++!

    28 years back, I began writing a language specifically suited to parallel processing, yet still syntactically ‘obvious’ enough to be teachable to most programmers. I called it D. In 1991, we definitely had C++ of course. And I liked the language a lot. It seemed though to have taken on that kind of formal-academic cruft that also made Java a pig, and Pascal a beautiful, but wordy mess. 

    The most important invention in D was the ;& operator. It was a programmer’s signal to the compiler to ‘spawn’ the statement thus ended and NOT wait for its result. Thus…

    for(i = 0; i < 100; i++)
    … array[i] *= fn( i, j, k ) ;&

    Became intrinsically parallel, without regard to exactly how many threads one’s CPU-and-operating-system were willing to met out. Or whether the GPU was involved … or not. The only limit was the iterator {for(….)} part. Obvious? Perhaps. 

    The sync/complete part of parallelism was handled in the code generator/optimizer. (A whole lot of ‘automagic’ parallelism didn’t require the ;& operator… technically not even the above did!) An efficient semiphore instantiater kept track of execution dependencies, allowing threads to gracefully block awaiting still-flying-along results to finish. 

    Anyway fun programmer stuff. 
    GoatGuy ✓

    Reply
  11. “Its a good time to be a computer scientist with a parallel-processing speciality.”

    Most the kids coming out of college are focusing on AI and training neural nets to sit and beg.

    As someone who enjoys parallel computation whenever I get to attempt it (using God’s language: C++) I can say that the languages are behind where the hardware is at. Sure the functional languages (LISP, Haskell) can scale better on these machines but they are not widespread and their syntax ahem.. sucks.

    We need a functional subset of C/C++. That can run on both CPUs and GPUs.

    Reply
  12. Agreed. I’m thinking AMD’s acquisition of ATI certainly brought in some expertise that will see the core count you spoke of. Intel’s inability/unwillingness to acquire Nvidia may cost it in the short term. The increase in core count on a single machine will open up many possibilities, but in my opinion the greatest short term effect will be infrastructure consolidation and subsequent efficiencies found through virtualisation on said machines. I’m certainly giddy about the next 5 years.

    Reply
  13. I’m amazed with “this round” of mega-chips from AMD and Intel. 64 cores simultaneously executing code at nearly unimpeded speed. Amazing.  

    Kind of leads to a “futurist commentary”, right?

    E.g. Will 7 nanometer design rules with EUV processing lead quickly to 5, then 3 then 2.5 and so on?

    Moore’s Law is definitely being challenged by optical and atomic physics. Yet, while no single die today can house all 64 cores, 250 megabytes of cache, the interprocess traffic, even so, it is clear that multiple chiplets can. Will this be extended to semi–3D topologies (stacked chiplets) too?

    TSMC sez it will start production at the 5 nm node “before 2021”. We can expect rapid advance to 3 or 3.5 nm from there, since EUV tech doesn’t see these dimensions as much harder. Below about 2.5 nm, the statistical distribution of dopant atoms in the silicon crystal matrix (critical for logic gate operation) becomes difficult to bracket for trillions of transistors.  

    But if the ‘partial failure still makes a good chip’ theology continues, then it shouldn’t matter. Soon, probably very soon, 256 core and higher counts per package. I’ve long felt that 512 to 1024 core all-on-one-carrier chips are the sweetest spot. Who knows… we might be seeing them in less than 5 years. 

    Its a good time to be a computer scientist with a parallel-processing speciality. 

    Just saying,
    GoatGuy ✓

    Reply

Leave a Comment