Jack Dongarra interview by Sander Olson. Dr. Dongarra is the Director of the Innovative Computing Laboratory at the Innovative Computing Laboratory and the Director of the Center for Information Technology Research, both at the University of Tennessee. Dr. Dongarra is an expert in high-performance computing, and will be one of the individuals most responsible for creating the infrastructure necessary to create an exaflop supercomputer. Dr. Dongarra predicts that by 2020, cellphones will have teraflop performance, laptops will have 10 teraflop performance, and supercomputers will achieve exaflop performance.
Five Important Concepts to Consider when Computing at Scale
Question: You have been working in the field of high-performance computing for decades. Did you imagine three decades ago that petaflop computers would ever exist?
Answer: Three decades is an eon in the computer industry. The laptop I use today would have been considered a supercomputer fifteen years ago. Ten years ago, we were planning for petascale computation. We had ideas as to how to build such a system, but most of those ideas were wrong, because they were not based on commodity processors. Now we do have petaflop (1015 FLOPs) systems, and we are committed to building an exaflop machine within the next decade.
Question: Is building an exaflop machine the primary goal of the Innovative Computing Laboratory, which you run?
Answer: Our mission at the Innovative Computing Laboratory here at the University of Tennessee is to be world leader in enabling technologies and software for scientific computing. Our vision is to provide high performance tools to tackle science’s most challenging problems and to play a major role in the development of standards for scientific computing in general.. We are doing the research needed for the development of new tools to evolve an exascale system.
Question: Although several petaflop supercomputers are now up and running, these machines each cost in the tens of millions of dollars. To what extent can sustained petaflop computing become affordable?
Answer: Supercomputers are now so expensive that they are designed more for communities than individuals. Getting time on these machines requires creating a proposal to use these costly and sophisticated resources. Ten years ago, teraflop machines represented the apex of supercomputers. But teraflop performance is now readily available for individual scientists. Similarly, I predict that ten years from now petaflop computing will be sufficiently inexpensive that it will be available even to individual researchers.
Question: Some semiconductor industry analysts are claiming that CMOS scaling will end in 2014. Are you concerned about how the end of Moore’s law will affect high-performance computing?
Answer: Most hardware researchers predict that silicon CMOS can continue scaling for the next ten years. Moore’s law – performance doubling every two years – should continue for the next decade. So the hardware path, unlike the software path, is clear.
Question: Speaking of software, the Jaguar supercomputer achieves a Linpack benchmark performance of 1.7 petaflops. How close can Jaguar get to such speeds while running actual applications?
Answer: There are actual applications running on Jaguar in the fields of material science and nanotechnology that exceed a petaflop. Unfortunately, only a handful applications today can get that close to the petaflop performance.
Question: Is this because those applications cannot be rewritten to achieve better performance, or because no one has bothered to recompile the software?
Answer: It is usually because the applications simply cannot be sped up. This may be due to algorithm issues, data movement, memory latency, or load balance among the many processors.
Question: Is it feasible to design a supercomputer specifically to run an application optimally?
Answer: Yes, for a sufficiently important problem exclusive hardware can be built to solve a particular problem. There is a Japanese supercomputer that only works only on gravitational wave computations. That approach is the most efficient and cost-effective way to solve specific problems, but the machine is effectively useless for anything else.
Question: Will IBM/Sony/Toshiba Cell chip play any role in future HPC systems, or is it effectively a dead end?
Answer: The Cell architecture is no longer being developed, so it is effectively dead. No new supercomputers will use Cell.
Question: Several supercomputers in the top 500 already contain GPUs. What proportion of supercomputers will contain GPUs 5 years from now?
Answer: The obvious upside of GPUs is that they provide compelling performance for modest prices. The downside is that they are more difficult to program, since at the very least you will need to write one program for the CPUs and another program for the GPUs. Another problem that GPUs present pertains to the movement of data. Any machine that requires a lot of data movement will never come close to achieving its peak performance. The CPU-GPU link is a thin pipe, and that becomes the strangle-point for the effective use of GPUs. In the future this problem will be addressed by having the CPU and GPU integrated in a single socket.
Question: What role will Nvidia’s Fermi cards have on the HPC field?
Answer: Fermi is currently Nvidia’s flagship, and it has a number of features that make it advantageous for high performance computing. The architecture supports up to 512 CUDA cores, so that alone is a considerable amount of computing horsepower. The GPU also contains L1 and L2 cache memories, supports double-precision IEEE floating point arithmetic, and also fully supports Error Correcting Code memory. Fermi is already being used on the Chinese Nebulae supercomputer, which placed second on the top 500 list of supercomputers. Within the next several years, a significant number of HPC systems will probably contain Fermi GPUs.
Question: What impact will cloud computing have on HPC?
Answer: Cloud computing is an attractive tool for certain types of computing. For data intensive applications, cloud computing makes sense. However, since the resources are geographically distributed, cloud computing will never make sense for tightly-coupled applications that suffer from bandwidth and latency issues. Since most applications running on supercomputers require tight coupling of data, cloud computing will never obviate the need for standalone machines.
Question: What is the optimal memory paradigm for exaflop systems?
Answer: The current memory paradigm is hierarchical, based on registers, L1 and L2 caches, local memory, shared memory, and distributed memory among nodes. That is a potential model for exaflop systems. However, we want exaflop systems to be designed to be relatively easy to program. We therefore want a globally shared address space, and explicit methods to pass data between the processors in order to orchestrate the unfolding computation. That paradigm may be necessary for a machine that has a billion threads.
Question: How quickly will China become a major player in the HPC space?
Answer: China is already a major player. In 2001, China had no machines in the top500 supercomputer list. They have more computing power than Japan, which is considered a supercomputing powerhouse, and I predict that within a year they will have surpassed the entire European Union with its 27 nations. China may soon have the fastest computer on the planet, and should soon have scores of systems in the top500.
Question: What are the base specifications for an exaflop machine?
Answer: The maximum price will be no more than $200 million, and the maximum power budget will be 20 megawatts. It will contain about 64 petabytes of RAM, so that alone will probably cost $100 million. Given that our Jaguar now consumes 7 megawatts, keeping within a 20 megawatt budget will be a major challenge.
Question: Those specs will require you to increase power efficiency by three orders of magnitude.
Answer: There are two models that we can use to get to an exaflop while staying within a 20 megawatt budget. The first model employs huge numbers of lightweight processors, such as IBM Blue Gene Processor running at one GHz. If we use 1 million chips, and each chip has 1,000 cores, then we can get to a potential billion threads of execution. The other approach is a hybrid that makes extensive use of coprocessors or GPUs. It would use a 1 GHz processor and 10,000 floating point units per socket, and 100,000 sockets per system. So either path could work.
Question: What do you see the high-performance computing landscape looking like in 2020?
Answer: By 2020, every system in the top500 list of supercomputers will offer petaflop performance or better. Cellphones will have teraflop performance, and laptops will have 10 teraflop performance. China could very well be a dominant force in supercomputing, and there will be several exaflop machines in operation. Having solved the myriad problems inherent in creating an exaflop machine, researchers will focus their efforts on designing and creating a zettaflop supercomputer.
If you liked this article, please give it a quick review on Reddit, or StumbleUpon. Thanks