Nvidia Fermi, AMD Radeon and Intel Larrabee

IEEE Spectrum predicted that Intel’s Larrabee chip would be a technological winner.

we believe that Larrabee is a winner—because of the one feature that it alone offers: C++ programmability. The first test chips came out only a few weeks ago (Jan 2009) , and the product won’t reach store shelves until late 2009.

Intel Larrabee looks to be up to 6-12 months later than planned and Nvidia’s next generation GPU the Fermi will be out with C++ support.

The last stated release date from Intel was either late this year or early 2010. That does not seem likely now.

“I never thought they were on time and I don’t think they are on track,” said Jim McGregor, chief technology analyst with In-Stat. “And I don’t think they are going to make their goal. Their goal that Pat [Gelsinger, former senior vice president who headed the Larrabee project] said last year was if it can’t compete on the highest end, they won’t release it.”

(May 2009)Intel VP of corporate technology Joseph Schultz mentioned that Intel is now looking at a release date of the first half of 2010, moved back from the company’s original late-2009 target.

The Top 10 Innovations in the New NVIDIA Fermi Architecture, and the Top 3 Next Challenges, David Patterson Director, Parallel Computing Research Laboratory (Par Lab), U.C. Berkeley

I believe the Fermi architecture is as big as an architectural advance over G80 as G80 was over NV40. The combined result represents a giant step towards bringing GPUs into mainstream computing. The table previews my take on the Top 10 most important innovations in the new Fermi architecture. This list is from a computer architect’s perspective, as a user would surely rank performance higher. At the end of the paper, I offer 3 challenges on how to bring future GPUs even closer to
mainstream computing, which the table also lists.

Past GPUs had a variety of different types of memories, each in their own address space. Although these could be used to achieve excellent performance, such architectures are problematic with programming languages that rely on pointers to any piece of data in memory, such as C, C++, and CUDA

Fermi has rectified that problem with by placing those separate memories—the local scratchpad memory, the graphics memory, and system memory—into a single 64‐bit address space, thereby making it much more easier to compile and run C and C++ programs on Fermi. 6 Once again, PTX enabled this relatively dramatic architecture change without the legacy binary compatibility problems of mainstream computing.

21 page Nvidia white paper on Fermi

The implementation of a unified address space enables Fermi to support true C++ programs. In C++, all variables and functions reside in objects which are passed via pointers. PTX 2.0 makes it possible to use unified pointers to pass objects in any memory space, and Fermi’s hardware address translation unit automatically maps pointer references to the correct memory space. Fermi and the PTX 2.0 ISA also add support for C++ virtual functions, function pointers, and ‘new’ and ‘delete’ operators for dynamic object allocation and de-allocation. C++ exception handling operations ‘try’ and ‘catch’ are also supported.

Nvidia Fermi versus AMD Radeon

Fermi versus AMD Radeon: Who Wins, Who Loses in Supercomputing Applications?

While AMD and Nvidia battle for supremacy in the GPU computing arena, there’s one obvious loser, Intel. AMD’s 5870 appeared on schedule. Nvidia’s Fermi is late, but its GTX 280 series still is competitive. Intel’s Larrabee remains a no-show. End users who buy their systems by the teraflop have discovered and validated an alternative approach that requires fewer x86 CPUs, less power, and less space. GPU computing is here to stay, and the market will punish those who lack a competitive offering.

While AMD and Nvidia battle for supremacy in the GPU computing market, there’s one obvious loser, Intel. AMD’s 5870 appeared on schedule. Although Nvidia’s Fermi is late, its prior generation GTX 280 still has some life left in it. But, Intel’s many-core Larrabee is still a no-show, and the company’s Larrabee demo at its recent Developers’ Forum was universally regarded as brain-dead, if not an outright embarrassment. End users with high performance computational requirements previously filled their data centers with racks of x86 servers to handle those requirements. Now they have now discovered and validated an alternative approach that requires fewer x86 CPUs, less power, and less space. GPU computing won’t solve all the world’s computing problems, but it will give users who buy their systems by the teraflop a new, more cost effective alternative that will take some of the wind out of Intel’s high performance computing sales.