The Best AI Go Programs Have Large Flaws Exposed Which Means More Testing and Care Needed for AI

It has been believed that the AI programs that play the strategy game of Go had gone far beyond human capability by 2016. In March 2016, it beat human world champion Lee Sedol in a five-game match, the first time a computer Go program had beaten a 9-dan professional without handicap. Although it lost to Lee Sedol in the fourth game, Lee resigned in the final game, giving a final score of 4 games to 1 in favor of AlphaGo. In recognition of the victory, AlphaGo was awarded an honorary 9-dan by the Korea Baduk Association.

In 2023, a new computer program was used to probe for weaknesses with the Go-playing AI programs. Weaknesses were found and the weaknesses were so simple to exploit they were taught to an amateur player. Amateur players would have no chance against the top human Go players. The top human Go players were beaten by the AI software but without comprehensive knowledge of what weaknesses the AI programs have. The new attacking AI computer program has found large repeatedly exploitable flaws in how the Go AI plays.

The flaws in the programs show that there is no actual understanding being replicated in the AI programs. The programming and techniques can do well at mimicking intelligence and achieving patterns of very strong play, but flaws can be found with enough testing. The Game of Go has complexity far beyond brute force supercomputing. The AI Go programs have been shown to be far from perfect. This could also be the case with the new GPT-4 and Generative AI systems.

The results show that improvements in capabilities do not always translate into adequate robustness. Failures in Go AI systems are entertaining, but similar failures in safety-critical systems like automated financial trading or autonomous vehicles could have dire consequences. The ML research community should invest in improving robust training and adversarial defense techniques in order to produce models with the high levels of reliability needed for safety-critical systems.

A Go expert (Kellin Pelrine) was able to learn and apply the cyclic-adversary’s strategy to attack multiple types and configurations of AI Go systems. They exploited KataGo with 100K visits, which would normally be strongly superhuman. Besides previously studying our adversary’s game records, no algorithmic assistance was used in this or any of the following examples. The KataGo network and weights used here were b18c384nbt-uec, which is a newly released version the author of KataGo (David Wu) trained for a tournament. This network should be as strong or stronger than Latest.

Playing under standard human conditions on the online Go server KGS, the same Go expert (Kellin Pelrine) successfully exploited the bot JBXKata005 in 14/15 games. In the remaining game, the cyclic group attack still led to a successful capture, but the victim had enough points remaining to win. This bot uses a custom KataGo implementation, and at the time of the games was the strongest bot available to play on KGS.

The same Go expert (Kellin Pelrine) exploited JBXKata005 while giving it a huge initial advantage through a 9 stone handicap. A top level human player with this much advantage would have a virtually 100% win rate against any opponent, human or algorithmic.

While Go AIs do already have known weaknesses, for instance the common “ladder” tactic, there are 3 key factors here whose confluence makes this vulnerability different.
1. This affects top AIs, including when they have a very large amount of search.
2. The attack works consistently to produce a game-winning advantage.
3. This consistency does not require repeating exact sequences or board positions.

KataGo is currently training with numerous positions from these adversary algorithm games. There are clear improvements, but so far it is still vulnerable. It is not as easy as one might hope to fix an issue like this.

The current best AI go program KataGo can play at a strongly superhuman level. KataGo has beat ELF OpenGo and Leela Zero which are themselves superhuman. KataGo without search plays at the level of a top-100 European player, and that KataGo is superhuman at or above 128 visits of search per move. The new AI attacking computer program attack scales far beyond this level, achieving a 72% win rate against KataGo playing with ten million visits of search per move.

The new paper and attacking program makes three contributions.

1. A novel attack method, hybridizing the attack of Gleave et al. (2020) with AlphaZero-style training (Silver et al., 2018).
2. They demonstrate the existence of two distinct adversarial policies against the state-of-the-art Go AI system, KataGo.
3. They provide a detailed empirical investigation into these adversarial policies, including showing they partially transfer to other Go AIs and learn interpretable strategies that can be replicated by experts under standard human playing conditions.

This paper has demonstrated that even superhuman agents can be vulnerable to adversarial policies. However, the results do not establish how common such vulnerabilities are: it is possible Go-playing AI systems are unusually vulnerable. A promising direction for future work is to evaluate our attack against strong AI systems in other games.