The computer chips of the future are likely to have hundreds or even thousands of cores. For chip designers, predicting how these massively multicore chips will behave is no easy task. Software simulations work up to a point, but more accurate simulations typically require hardware models — programmable chips that can be reconfigured to mimic the behavior of multicore chips.
Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) presented a new method for improving the efficiency of hardware simulations of multicore chips. Unlike competing methods, it guarantees that the simulator won’t go into “deadlock” — a state in which cores get stuck waiting for each other to relinquish system resources, such as memory. The method should also make it easier for designers to develop simulations and for outside observers to understand what those simulations are intended to do.
Hardware simulations of multicore chips typically use devices called field-programmable gate arrays, or FPGAs. An FPGA is a chip with an array of simple circuits and memory cells that can be hooked together in novel configurations after the chip has left the factory. The chips sold by some small-market manufacturers are, in fact, specially configured FPGAs.
They developed a circuit design that allows the ratio between real clock cycles and simulated cycles to fluctuate as needed. That allows for faster simulations and more economical use of the FPGA’s circuitry.
Every logic circuit has some number of input wires and some number of output wires, and the CSAIL researchers associate a little bit of memory with each such wire. Data coming in on a wire is stored in memory until all the operations that require it have been performed; data going out on a wire is stored in memory until the data going out on the other wires has been computed, too. Once all the outputs have been determined, the input data is erased, signaling the completion of one simulated clock cycle. Depending on the complexity of the calculation the circuit was performing, the simulated clock cycle could correspond to one real clock cycle, or eight, or something in between.
The memory associated with the input and output wires and the logic to control it does take up some extra real estate on the FPGA, but “you only do it for select parts of the design, like complex logic, or these memories that you just cannot fit on chip on FPGAs,” Khan says. “And remember, through this overhead, you are able to save a lot of resource” — that is, FPGA circuitry. Moreover, it’s the memory system that allows the researchers to guarantee that their simulator won’t deadlock. Other research groups have developed FPGA multicore-simulation systems that use space and time efficiently, but they don’t offer that guarantee.
Another advantage of their system, the CSAIL researchers argue, is that it makes it easier for outside observers — and even for chip designers themselves — to understand what a simulation is intended to do. With other researchers’ simulators, it’s often the case that “the cycle-level specification for the machine that they’re modeling is in their heads,” Khan says. “What we’re proposing is, instead of having this in your head, let’s start with a specification. Let’s write it down formally but in a language that is at a very high level of abstraction, so it does not require you to write a lot of details. And once you have this specification that clearly tells you how the entire multicore model is going to behave every cycle, you can transform this automatically into an efficient mapping on the FPGA