More teraflop chips are being introduced by AMD and Nvidia. AMD FireStream 9250 will be available at the end of Sept 2008 for $1999. The FireStream chips can perform “double precision” floating-point calculations at 1 teraflop of performance.
A 4x Crossfire X configuration is good for almost 5 TFlops, which means 1.2 TFlops per RV770 GPU. The R700 – the dual-GPU 4870 X2 card – is good for almost twice that performance: According to AMD, the R700 will deliver 2 TFlops per board. The AMD chips use less power than the Nvidia chips.
UPDATE: The Nvidia Tesla 10P is the company’s second generation, general purpose graphic processing unit for high-performance computing (GPGPU). The latest product has twice the performance of the previous generation Tesla 8 product, or 1 teraflop of computational power versus 500 gigaflops. The T10P also has more than double the memory at 4GB versus 1.5GB in the older model. Nvidia’s Tesla C1060 card slips into a PCIe slot for delivering high-performance computing to workstations. The card delivers 1 teraflop of power for $1,699. The Nvidia 10P also supports double precision processing which is needed for many scientific supercomputing applications
The Nvidia chips include 1.4 billion transistors running at speeds up to 1.6 GHz. “It’s one of the largest chips in mass production and unlike some processors it is not 60 percent cache,” said Keane.
Nvidia is continuing to push GPGPU (Nvidia Tesla) systems for technical computing using graphics chips. The company is launching two board level products for such high-end apps, including a four GPU system in a 1U-sized rack-mounted device that delivers up to 4 TFlops at 700W. It sells for $7,995.
Nvidia is updating its year-old Cuda development environment that helps programmers modify C programs to exploit parallelism. Cuda now supports all major 32- and 64-bit operating systems and includes a number of parallel algorithms and tools. On the road map for Cuda is support for Fortran and multiple GPU systems and clusters. Nvidia is also working on support for C++ and a hardware debugger.
Evolved machines uses the Tesla processors to speed up by 130-fold against simulations with current generation x86 microprocessors. They are now engaged in the design of a rack of GPUs, which will rival the world’s top systems, at 1/100 their cost.
Simulation of a single neuron involves 200,000,000 differential equation evaluations per second, requiring approximately 4 gigaflops. A neural array engaged in sensory processing requires thousands of neurons, thus, the detailed simulation of neural systems in real time requires more than 10 teraflops of computing power.
Intel Larrabee chip was delayed
According to Hiroshige’s Goto Weekly from Japan, there’ll be 24 and 32 core variants out in 2009 and a 48 core chip in 2010.
AMD new high end graphics chips will be a cheaper two chip set
AMD says its 4850 device at about 110 W and $199 will deliver about 75 percent of the performance of Nvidia’s high-end GTX280 which costs $649 and dissipates 236W. AMD will claim technology leadership in two areas. Its chips will use more than 500 cores, more than double the 240 cores on the new Nvidia parts. They will also use GDDR5 memory interfaces running at about 3.2 Gbits/s or more. Nvidia will use the existing GDDR3 protocol running at up to 1.1 GHz on a 512-bit interface to deliver memory bandwidth up to about 102 Gbytes/s on some versions.
Because memory bandwidth is not increasing as fast as processing resources on the latest Nvidia architecture, some applications may find they are memory bound, said Andy Keane, general manager of GPU computing at Nvidia.
Bergman said the AMD focus on a more mainstream design will enable it to roll out this fall a version for notebook computers that consumes less than 70W. “There’s no way this new Nvidia core will be in notebooks this fall,” Bergman said.