Uzi Vishkin, Who Has a New Computer Programming Paradigm, Interview by Sander Olson

Here is the Uzi Vishkin interview. Dr. Vishkin is a Professor in the Department of electrical and computer engineering at the University of Maryland:

Dr. Vishkin has developed a new computer paradigm that has the potential to speed up many computer operations by 10x or even 100x. The new software developed by Dr. Vishkin, called eXplicit Multi Threading (XMT), is so easy to learn that high school students have been taught how to program it. Semiconductor makers are starting to show interest in this new technique given the dramatic speed gains that are possible. XMT could be standard on most desktops by 2013.

Explicit Multi-Threading (XMT): A PRAM-On-Chip Vision web page, which has technical information and tutorials

Question: Why are current software techniques poorly suited to take advantage of current multicore processors?

Answer: Students in computer science are still learning the C programming language, which has been used since the 1970s and which is poorly suited to extract the performance of multicore processors. The “software spiral” is effectively broken, and new hardware gains will only result in performance improvements if coupled with software that is explicitly parallel. In order to take full advantage of the capabilities that multicore processors offer, the computer industry will need to embrace a paradigm shift. Quite a few parallel programming languages have been developed over the years for supercomputers, which are also parallel machines . However, programming them is considered a challenge, and has repelled all but the most devoted programmers. We now have a new paradigm, called eXplicit Multi-Threading (XMT). XMT offers the best chance to take advantage of the benefits of multi-core processors.

Question: Tell us about XMT. How is it superior to current approaches?

Answer: XMT comprises a workflow process that systematically guides the programmer from finding a fast parallel algorithm to producing a parallel program. I developed the XMT paradigm in the late 1990s. It evolved from an earlier concept known as the Parallel Random-Access Machine/model (PRAM), which was an early attempt at parallel programming. The PRAM attempt was considered as having a great potential for a breakthrough in parallel computing, but until the arrival of the XMT paradigm with it, it was not clear how reduce this potential to practice. XMT requires a coordinated hardware and software approach to work optimally. At its core it is an extremely fine-grained approach to programming which is far superior to the course-grained programming techniques employed today. The XMT’s fine-grained and irregular approach to parallel programming is so easy to use that even high school kids can write efficient programs.

Question: How does XMT work?

Answer: In XMT, the programming task is broken down, or “spawned” into threads that are very fine-grained. The instructions are essentially “broadcast” to all the cores of a chip instead of being individually sent to each core. Each thread advances on its own, rather than being synchronized with the other threads. The individual threads are then joined together when completed. So there are hardware algorithms for partitioning and merging the threads.

Question: What are the main advantages of XMT?

Answer: XMT provides several potential advantages over competing solutions. First and foremost, It is highly efficient, and can take full advantage of the latent power of PRAM programming, a concept that provides the largest knowledge based and easiest known approach to parallel programming. It is highly scalable, and can seamlessly scale to 1,000 cores or more. It is also easy to learn and program – a group of students at Thomas Jefferson high school in Virginia has already learned how to program using XMT.

: What sort of performance speedups did you achieve with this hardware? What speed improvements could one expect with most applications?

Answer: Compared to a core-2 duo processor, we managed a 10x improvement for our 64-processor XMT prototype, which in silicon is the same size as the Core-2 duo. We believe this to be indicative of the performance speedups that are possible. Although performance improvements will vary depending on the application, we predict substantial speedups of most applications. For some general-purpose applications, 100x speedups should be feasible with a 1000-processor XMT chip.

Question: Tell us more about this prototype XMTmachine.

Answer: The prototype is a Field-Programmable Gate Array (FPGA) running at 75 MHz. It has 64 cores, and was specifically designed for XMT computing. It was so easy to build that it was designed by a single graduate student in only 2 years using standard Verilog software. In silicon, the chip is about the same size as a single commodity CPU core. Performance would obviously increase if it were to be fabbed using more modern lithography, such as 45 nm transistors.

Question: How radically does hardware need to be redesigned in order to take full advantage of XMT?

Answer: The changes are not radical. One needs to add a prefix-sum functional unit, which allows for fast coordination among parallel processors. An extension of the von-Neumann program- counter and stored-program apparatus is needed, along with a reduced synchrony on-chip interconnection network. A uniform memory architecture is also advantageous. Such changes are actually fairly minor and relatively easy to implement – some current and emerging chip designs would only need to be modified in order to take advantage of XMT.

Question: How long does it take to learn the XMT programming model?

Answer: Not long at all. The XMTC language is essentially C with two additional commands: SPAWN and PREFIX-SUM. This workflow, with multiple levels of abstraction, allows high-school and college programmers to do the same tasks as they are currently doing in their serial programming courses.

Question: How many programs could be rewritten in order to take advantage of XMT?

Answer: I believe that most programs could be modified to take advantage of XMT. Even programs such as data mining can be sped up substantially by using XMT. The objective now is to get a large number of programmers familiar with XMT concepts and programming.

Question: Is XMT intended to replace other programming models, such as Nvidia’s CUDA?

Answer: No, it is intended to supplement and enhance other programming models.

Question: How familiar is the computer industry with XMT?

Answer: The industry is becoming aware of this. I gave talks on XMT last year to Intel in the US, and in China. I am about to leave to give another series of talks on this in Haifa, Israel. I have also taught the XMT programming to college and high-school students. I am in talks with other researchers at semiconductor companies about these concepts. The industry will become increasingly interested in XMT as it becomes apparent that this is a necessary solution to the multicore programming problem.

Question: How long before the XMT concept becomes mainstream?

Answer: XMT could be standard on most desktops as early as 2013. XMT is already mentioned in 28 non-XMT patents, and working hardware has already demonstrated that is an order of magnitude speedup over conventional hardware running standard software. The industry realizes that a new workflow paradigm is needed for parallel programming, and the XMT paradigm works. XMT will be the enabling paradigm that allows the computer industry to break the multi-core programming bottleneck.