The Godson-3A chip was implemented in a 65 nanometer process and ran at 1 GHz to deliver 16 gigaflops of floating point oomph. The chip has 425 million transistors, an area of 174.5 square millimeters, and burned only 10 watts under load. The chip included two 16-bit HyperTransport ports (licensed from Advanced Micro Devices), 4 MB of L2 cache, and two on-chip memory controllers that support either DDR2 or DDR3 main memory.
The Godson chip effort is one of 16 different projects, in fact, that are each funded with between $5bn and $10bn. The massive projects focus on specific technology areas that China reckons are key for its technological independence and economic future, including processors and operating systems, chip process technology, 4G wireless networks, nuclear fission power plants, water pollution control and treatment, aircraft design and construction, high-resolution satellite imaging, and manned spaceflight and lunar exploration.
With the Godson-3B, which is what Hu was there to talk about in San Francisco, ICT is sticking with the same 65 nanometer CMOS process and running the chip at the same 1 GHz. But the chip is bumped up to eight cores from four and has two 256-bit vector co-processors per core. The chip has two HyperTransport ports and two DDR3 memory controllers, and weighs in at 583 million transistors in a 300 square millimeter area. Running at 1 GHz, peak performance on those vector units is 128 gigaflops, with the chip only emitting 40 watts. According to early tests, the cores burn about 28.9 watts, while the uncore parts of the chip (HT, memory controllers, and crossbar switches for linking chips together) consume 11.1 watts.
According to Hu, the vector extension unit in the Godson-3B and Godson-2H processors have 128-entry, 256-bit register files and have more than 300 SIMD instructions that have been added to the MIPS architecture
The Godson-3B processor will be used in the Dawning 6000 petaflops supercomputer, which China will be tweaking in 2012.
At ISSCC this week they showed off system board for a 1U rack server.
This IU2T system board packs 16 of the eight-core Godson-3B processors onto a single board, rated at 2 teraflops. So a rack of these puppies would yield 42 teraflops. So instead of hundreds of cabinets to reach 1 petaflops of raw number-crunching performance, as it can take with big x64-based machines, ICT could, in theory, do it with 24 racks.
ICT is not going to stop here. The Godson-3C design will shift to a 28 nanometer process and will come in eight-core variants like the Godson-3B as well as a 16-core variant. The Godson-3C will have faster clock speeds, too, running at between 1.5 GHz and 2 GHz. The roadmap says the chip is also capable of expanding up to 16 cores, too. ICT says the Godson-3C will deliver 512 gigaflops of raw performance on math work, and the way the math works, that is twice as much math moving from 1 GHz to 2 GHz and then a doubling again as the core count goes from 8 to 16. This chip is expected sometime around late 2012 or early 2013.