Clemson Student Uses Artificial Intelligence to Gain 10x Speedup on Altera Cyclone FPGA

Sometimes it’s hard to explain how much faster software logic can run when it is deployed in hardware. In a clever benchmarking experiment that incorporates simple logic refactoring, artificial intelligence, and machine compilation from software to FPGA hardware, a Clemson student took a popular board game and turbo-charged it 10x in an Altera Cyclone FPGA.

“Blokus-Duo” is a two-player strategy-based board game. It is a compact version of the original Blokus game. Its small size, progressive logic, and strategy-based approach makes Blokus-Duo a highly suitable application for performance evaluation of various Artificial Intelligence (AI) techniques on limited resource platforms like smaller Field Programmable Gate Arrays (FPGAs)

This implementation involves each player running an AI routine, either on an FPGA or a computer, with a host machine coordinating the two players. The players communicate with the host machine using an RS-232C (D-Sub 9-pin) serial port interface. Each move is transmitted as a code, which is subsequently decoded by the host machine. The host machine then informs the other player of its opponent’s move. Besides coordinating, the host machine monitors the status of the game including violations by any player as well as final outcomes.

A minimax-based game tree algorithm is employed with alpha-beta pruning for the AI routine. The Altera DE2 115 development board with a Cyclone IV E FPGA is used as the target platform for the implementation.

When the game tree is searched to a depth of 2, the speedup achieved is around 11X. The speedup varies during the course of the game as the load pattern changes. The speedup mentioned above is one among the higher values achieved when there are maximum moves to search.

When the game tree is searched to a depth of 3, the speedup achieved is around 4X. The decrease in speedup can be attributed to the following reason. Pruning of the game tree search occurs best when the latest values of alpha and beta are used, which happens in case of the sequential software solution. However, due the distributed architecture of the FPGA implementation, each process communicates only with its parent process. This results in various processes using older values of alpha and beta, thus leading to inefficient pruning and redundant search overhead.

The code for this example is available for qualified researchers.

Future work
Further acceleration is possible through algorithmic improvements to the minimax algorithm, including techniques like aspiration search, iterative deepening, and distributed tree search. Architectural improvements involve exploring communication and computation trade-offs.


If you liked this article, please give it a quick review on ycombinator or StumbleUpon. Thanks