Europe plans to use Smartphone ARM chips and GPUs to make an exaflop supercomputer that is 30 to 50 times more energy efficient than the best supercomputers today

Since October 2011, the aim of the European project called Mont-Blanc has been to design a new type of computer architecture capable of setting future global HPC standards, built from energy efficient solutions used in embedded and mobile devices. This project is coordinated by the Barcelona Supercomputing Center (BSC) and had a budget of over 14 million, including over 8 million Euros funded by the European Commission. Two years later, the European Commission granted additional 8 million Euro funds to extend the Mont-Blanc project activities until September 2016.

This three year extension will enable further development of the OmpSs parallel programming model to automatically exploit multiple cluster nodes, transparent application check pointing for fault tolerance, support for ARMv8 64-bit processors, and the initial design of the Mont-Blanc Exascale architecture.

The MONTBLANC project brings together leading researchers from Spain, the UK, France, Italy and Germany with the aim of delivering supercomputers that could revolutionise the way we work. These new machines would be built around ‘exascale processors’ – processors that can carry out in the order of 10 to the power of 18 (1, followed by eighteen zeroes) operations a second.

These new processors won’t just deliver higher performance – some nine orders of magnitude faster than your existing desktop or laptop processor – but they will also consume less energy. According to Mr Ramirez, the processors that the MONTBLANC project is using will consume between 15 and 30 times less energy that the systems we use today.

1 Megawatt of power costs about $1 million per year to keep running. A supercomputer that needed 100 megawatts of power would cost $100 million per year. The current largest supercomputers (in China and the USA) need 20-30 megawatts.

Mont Blanc High Performance Blade Server

At present, billions of High Performance Computing cycles are offered as a service to businesses and researchers in manufacturing, pharmaceuticals and the financial services industries via the PRACE (Partnership for Advanced Computing in Europe) project. PRACE gives access to six high performance computing clusters that offer, between them, nearly 20 petaflops (20 quadrillion operations per second) of processing power. But, impressive as this is, the PRACE resources cannot meet the current demand for high performance computing from research and industry, and the available processing power it’s still two orders of magnitude short of exascale processing.

In November 2013, the first test units of the Mont-Blanc prototype were presented at the SC13 conference held in Denver, USA. The Mont-Blanc compute cards deliver considerably higher performance; at 50% lower energy consumption, compared with previous ARM-based developer platforms.

The Mont-Blanc prototype is based on the Samsung Exynos 5 Dual SoC, which integrates a dual-core ARM Cortex-A15 and an on-chip ARM Mali-T604 GPU, and has been featured and market proven in advanced mobile devices. The dual-core ARM Cortex-A15 delivers twice the performance of the quad-core ARM Cortex-A9, used in the previous generation of ARM-based prototype, whilst consuming 20% less energy for the same workload. Furthermore, the on-chip ARM Mali-T604 GPU provides 3.5 times higher performance than the dual-core Cortex-A15, whilst consuming half the energy for the same workload.

Each Mont-Blanc compute card integrates one Samsung Exynos 5 Dual SoC, 4 GB of DDR3-1600 DRAM, a microSD slot for local storage and a 1 GbE NIC, all in an 85x56mm card (3.3×2.2 inches). A single Mont-Blanc blade integrates fifteen Mont-Blanc compute cards and a 1 GbE crossbar switch, which is connected to the rest of the system via two 10 GbE links. Nine Mont-Blanc blades fit into a standard BullX 9-blade INCA chassis. A complete Mont-Blanc rack hosts up to six such chassis, providing a total of 1620 ARM Cortex-A15 cores and 810 on-chip ARM Mali-T604 GPU accelerators, delivering 26 TFLOPS of peak performance

If you liked this article, please give it a quick review on ycombinator or StumbleUpon. Thanks