Nvidia and AMD battle with multi-teraflop GPUs

HPCWire – NVIDIA today launched the second generation of its breakthrough workstation platform, NVIDIA® Maximus™, featuring Kepler™, the fastest, most efficient GPU architecture.

The Maximus platform, introduced in November, gives workstation users the ability to simultaneously perform complex analysis and visualization on a single machine. Now supported by Kepler-based GPUs, Maximus delivers unparalleled performance and efficiency to professionals in fields as varied as manufacturing, visual effects and oil exploration.

Second generation NVIDIA Maximus-powered desktop workstations featuring the new NVIDIA Quadro K5000 ($2,249 MSRP, USD) plus the new NVIDIA Tesla K20 GPU ($3,199 MSRP, USD) will be available starting in December 2012. The NVIDIA Quadro K5000 will be available as a separate discrete desktop GPU starting in October 2012.

The first Quadro workstation graphics card based on the Kepler K10 GPU was shown at the SIGGRAPH trade show. The K5000 card has one Kepler K10 GPU, which has 1,536 CUDA cores and which delivers 2.1 teraflops of single-precision floating point math.

Comprised of 7.1 billion transistors, the Kepler GK110 GPU is an engineering marvel created to address the most daunting challenges in HPC. Hyper-Q is a flexible solution that allows connections for both CUDA streams and Message Passing Interface (MPI) processes, or even threads from within a process. Existing applications that were previously limited by false dependencies can see up to a 32x performance increase without changing any existing code.

AMD also has multiple teraflops in one GPU

AMDevices is keeping the heat on rival Nvidia in the workstation graphics market with the launch of four FirePro graphics cards, topping out at 4 teraflops of floating point, and two CPU-GPU hybrids based on the new “Piledriver” Opteron cores and bear the FirePro rather than the Fusion APU brand used for consumer gear.

AMD says that the new lineup of discrete graphics cards are aimed at workstation users rather than gamers, and they sport the performance that end users working from CAD/CAM, EDA, and various media and rendering applications need and sporting the prices that they are willing to pay.

AMD FirePro W9000 and W8000 discrete GPUs

There are four new FirePro graphics cards for workstations, all of which are based on AMD’s “Southern Islands” family of GPUs that are already inside the top-end Radeon lineup for gamers and consumers. These are fabbed by Taiwan Semiconductor Manufacturing Corp using its popular and slowly ramping 28 nanometer process.

All of the new FirePro cards plug into similarly shiny PCI-Express 3.0 x16 slots that made their debut earlier this year in workstations and servers using Intel’s Xeon E3-1200 v2, E5-2400, and E5-2600 processors. No other CPUs support PCI-Express 3.0 slots yet, but presumably AMD will get the lead out and add PCI-Express 3.0 support to the next iterations of its chips.

With the “Tahiti” GPU that is part of the Southern Islands family, AMD has put the single-precision floating point operations pedal to the metal, cranking up to just under 4 teraflops and offering nearly 1 teraflops of double-precision . This top-end card has a 384-bit memory interface and delivers 264GB/sec of bandwidth across its 6GB of GDDR5 main memory.

AMD FirePro workstation discrete graphics cards

AMD is pitching this card against Nvidia’s Quadro 6000 card, which also has a 384-bit memory interface but only 144GB/sec of memory bandwidth across its 6GB of graphics memory. Because it is based on the earlier “Fermi” GPU from Nvidia (and one with only 448 of its 512 cores activated) it only delivers 1.03 teraflops single precision and 515.2 gigaflops double precision. Even the new Quadro K5000 graphics card, which has 1,536 cores in the Kepler GPU, only cranks out 2.1 teraflops at single precision and only 173GB/sec across its 4GB of GDDR5 graphics memory.

The good news for AMD is that the new FirePro W9000 and W8000 graphics cards sport error correction on the GDDR5 memory, something that Nvidia has supported for years in the Fermi GPUs and that AMD needed for its own GPUs if it hopes to revive the FireStream GPU coprocessor biz or pitch FirePro cards against Nvidia Tesla GPU coprocessors in supercomputers.

The FirePro W7000 graphics card

The W9000 can drive up to six displays and costs a rather steep $3,999, or a buck per megaflops single precision if you want to look at it that way, or just a little over four bucks double precision megaflops if that is more important to your workloads. For plenty of applications, particularly in signal processing, seismic processing, life sciences, and media, things are done in single precision so this is the important thing.

The W8000 has 19 per cent less SP and DP oomph, as well as 33 per cent less memory at 4GB, and its memory interface is only 256 bits wide and only delivers 176GB/sec of bandwidth. But it only costs $1,599 and when you do the math, it delivers twice the bang for the buck on both raw SP and DP operations than the W9000 it sits underneath in the product line. AMD is selling the W8000 against the Nvidia Quadro 5000, which is rated at 718 gigaflops SP and 359 gigaflops DP.

Both the W9000 and W8000 eat up two PCI Express 3.0 x16 slots in a workstation or server, and they are relatively hot, at 274 watts and 225 watts, respectively.

The W7000 has fewer cores running at slower clock speeds on the Southern Islands GPU, and therefore only delivers 2.4 teraflops SP and a mere 152 gigaflops DP, but again, it only costs $899. That’s a bit cheaper for single-precision math than the W8000 – about 24 per cent less dough per flops – but if you need any double-precision math, the W7000 is not for you. The W7000 is targeted at customers who might otherwise buy an Nvidia Quadro 4000.

If you liked this article, please give it a quick review on ycombinator or StumbleUpon. Thanks