Bob Doud Interview by Sander Olson
In Q4 last year, multicore processor startup Tilera Inc. announced its 3rd generation processor family, with a flagship version containing 100 64-bit cores on a single chip. Following is a conversation Sander Olson of Next Big Future had with Bob Doud, Director of Marketing for Tilera.
Figure 1: Tilera’s mesh architecture for multicore
Question: How long has Tilera been focused on Multicore processors?
Answer: Tilera was founded in 2004 based on technology developed at MIT. The founder and Tilera CTO, Dr. Anant Agarwal, has spent most of his career in multiprocessing and multicore computing. Tilera came out of stealth mode in 2007, and at that time we announced our first processor, the TILE64 with 64-cores and integrated memory controllers and high-speed I/O.
Question: What has Tilera done to warrant so much buzz within the semiconductor industry?
Answer: The reason for the recent excitement is the announcement of our 100-core model, which we call the TILE-Gx100 processor. It is built in 40 nm technology and clocks at speeds from 1 to 1.5 GHz, and offers power/performance ratios that are dramatically better than conventional CPUs. We will also be offering 16, 36 and 64-core versions in the TILE-Gx family.
Question: The Gx100 is a 100 core, tile-based processor. How do you manage to get data to each of the 100 cores?
Answer: Typical processors have wide, fast and power-hungry buses or rings, which traverse much of the chip. Although these are not particularly efficient, they work fine with 4 or 8 cores on a chip. However, Tilera employs a mesh architecture for our chip, and this allows each tile to be wired only to its four direct neighbors. We actually have five separate meshes, which are functionally separate but operate simultaneously. Each tile has its own routing switch that passes the data packets on to the correct destination tile or memory controller. With all of these connections between cores, the Gx processor has over 200 terabits per second of interconnect bandwidth. That is about twice as much bandwidth as is theoretically needed and this relatively light loading avoids contention and bottlenecks for messages.
Question: How difficult is the GX 100 to program? Can it run Windows?
Answer: It is not difficult at all to program. We run multiprocessing versions of Linux (SMP Linux), and one simply needs to compile and run standard C or C++ code. Programmers do need to use programming techniques that take advantage of multiple cores, but that is true of any modern multicore CPU. Although the Gx family could theoretically run Windows, we haven’t ported Windows to this platform. This chip is aimed more at enterprise/infrastructure applications where Linux is king rather than for the consumer market.
Question: Is this chip designed more as a CPU replacement or as a co-processor CPU?
Answer: We see both models being used. The Gx100 can be used to do anything an x86 processor can do, only more efficiently and faster. Our current generation TILEPro64 is roughly twice as fast as a high-end quad-core Nehalem processor and delivers about 7 times the performance-per watt on enterprise application benchmarks. So in most circumstances a separate x86 CPU is unnecessary. The TILE-Gx family will increase our compute-per-watt lead even further.
Question: What sort of floating-point performance does this chip offer ?
Answer: We designed the Gx family more for general-purpose integer computing and digital signal processing rather than for floating-point operations. We didn’t include a large floating-point accelerator built into the processor, since floating-point operations are very energy-intensive. This allows us both to reduce the die size and save considerable amounts of energy. The chip is capable of floating-point however – comparable to that of a high-end x86 processor. Customers that require both integer and floating-point calculations could pair our systems with GPUs, which are designed specifically for floating-point math.
Question: Is the GX-100 appropriate for cloud computing?
Answer: The Gx100 is specifically designed to run cloud-based applications. We already run the standard “LAMP” stack (Linux, Apache server, MySQLdatabase, and PHP scripting), and we have demonstrated a number of cloud-based apps at a fraction of the power consumption of traditional CPUs. Since energy and cooling costs contribute about half of the lifecycle expense of a datacenter, this is a huge motivation to switch. So data centers that switch from x86 cpus to our processors could achieve substantial annual savings simply by recompiling their software. We have some tier-1 manufacturers in the cloud space building standalone servers based on the Tile processors.
Question: Your processors aren’t binary compatible with Intel. Won’t this require users to recompile their code?
Answer: Sure. But if our processors were binary compatible, they would be as power hungry and inefficient as Intel processors! Our chips achieve a 7 to 1 efficiency improvement in large part because we eschewed binary compatibility with Intel and designed a modern very area- and power-efficient processor core. But recompiling apps is a small price to pay for the considerable cost savings that will result from employing our tile-based processors. The key is to support standard operating system environments and programming languages such as C and C++.
Question: Can this tile-based architecture scale as well as conventional CPUs?
Answer: Absolutely – it scales better than any other processor architecture out there. In our short time out of stealth mode, we have more than quadrupled performance from the TILE64 to the TILE-Gx. This architecture is particularly easy to scale due to the fact that the interconnect is 2-dimensional and you automatically get much more bandwidth as you add more cores.
Intel predicted in the late 1990’s that by 2011 they would have a 10 GHz, single core processor. But all processor makers quickly realized that multi-core was the only viable option for staying on Moore’s corollary performance curves while keeping power consumption reasonable. So to the extent that CPU makers are able to stay on the Moore’s curve they will need to adopt many of our techniques.
Question: How does the GX-100 compare with Intel’s 48-core research prototype announced last December?
Answer: Intel has recently unveiled a 48-core chip that is also based on tiles. This chip is in many respects architecturally similar to ours, such as by using a mesh architecture. But it is only a research prototype, and has no I/O to speak of. Moreover, it isn’t cache coherent, so they can’t run any software that requires cache coherency such as SMP Linux. We are already on the 3rd generation of this architecture, whereas they are still in the lab, so we effectively have a four year lead over Intel. But the fact that they developed a chip that is similar to ours clearly shows the validity of our approach.
Question: Is this chip suitable for mobile applications?
Answer: While our architecture scales down to very small core-counts, Tilera hasn’t targeted the mobile computing market yet. We do, however, offer a TILE-Gx16 chip, which is suitable for more cost-sensitive endpoint applications such as videoconference systems small-office security appliances or smaller 4G wireless base stations.
Question: Some researchers believe that 3-D computing is the future. Will it be feasible to stack your chips in a cube?
Answer: Yes, 3-D technology is highly amenable to our approach. 3-D processor architectures will be the next stage of computer development once 2-D scaling reaches a wall and the foundries can cost-effectively manufacture a 3-D substrate. But chips in a 3-D cube will necessarily need to be power-efficient and low power in order to allow for proper heat dissipation, so our architecture is ideally suited to 3-D scaling.
Question: What sort of products will Tilera be offering in 2020?
Answer: Over the next decade, it is clear that the industry will continue to shift from multicore to “many-core”. Processors will go from having four cores to having hundreds. By 2020, Tilera should be offering 1,000 core chips, perhaps in 3-D cubes that contain both processing and memory. Even mobile electronics such as cellphones could be using dozens of cores.
We also believe that the future of computing lies in homogeneous computing, where all the cores are the same. This simplifies the programming model and provides the greatest flexibility for allocating compute to the varying workloads that product vendors must deliver.
Tilera Video from about one year ago
Intel Labs announces Single-chip Cloud Computing experimental chip