For a matrix of size 2Kx2K they achieved 11.05 Gflop/s, which is around 75% of the double precision peak. They have also implemented a single precision version of the code, which achieved 155 Gflop/s (again around 75%efficiency) for a matrix of size 4Kx4K. Unfortunately, a single precision algorithm does not legitimately implement the Linpack benchmark. Our initial implementation of the mixed-precision Linpack benchmark  placed the CELL processor on the Linpack Report  with performance close to 100 Gflop/s.
One way of looking at the CELL processor is to treat it as eight digital signal processors (DSP), augmented with a control processor, on a single chip.
One of the major shortcomings of the current CELL processor for numerical application is the relatively slow speed of the double precision arithmetic. The next reincarnation of the CELL processor is going to include a fully-pipelined double precision unit, which will deliver the speed of 12.8 Gflop/s from a single SPE clocked at 3.2 GHz, and 102.4 Gflop/s from an 8-SPE system, what is going to make the chip a very hard competitor in the world of scientific and engineering computing. Given that, the current CELL processor employs a rather modest number of transistors of 234 million. It is not hard to envision a CELL processor with more than one PPE and many more SPEs, perhaps reaching the performance of a TeraFlop/s for a single chip.
The Cell2 is expected in 2008 and will initially be used in the Roadrunner supercomputer.
The eight PS3 probably get to a combined 500-800 gigaflops of performance for $3200.