# Prisoner Dilemma, which was used Cold War Politics Strategizing, has Winning Solutions

Technology Review – The world of game theory is currently on fire. In May, Freeman Dyson at Princeton University and William Press at the University of Texas announced that they had discovered a previously unknown strategy for the game of prisoner’s dilemma which guarantees one player a better outcome than the other.

That’s a monumental surprise. Theorists have studied Prisoner’s Dilemma for decades, using it as a model for the emergence of co-operation in nature. This work has had a profound impact on disciplines such as economics, evolutionary biology and, of course, game theory itself.

The game is this: imagine Alice and Bob have committed a crime and are arrested. The police offer each one a deal–snitch and you go free while your friend does 6 months in jail. If both Alice and Bob snitch, they both get 3 months in jail. If they both remain silent, they both get one month in jail for a lesser offence.

What should Alice and Bob do?

If they co-operate, they both spend only one month in jail. Nevertheless, in a single game, the best strategy is to snitch because it guarantees that you don’t get the maximum jail term.

However, the game gets more interesting when played in repeated rounds because players who have been betrayed in one round have the chance to get their own back in the next iteration.

Until now, everyone thought the best strategy in iterative prisoner’s dilemma was to copy your opponents behaviour in the previous round. This tit-for-tat approach guarantees that you both spend the same time in jail.

That conclusion was based on decades of computer simulations and a certain blind faith in the symmetry of the solution.

So the news that there are other strategies that allow one player to not only beat the other but to determine their time in jail is nothing short of revolutionary.

The new approach is called the zero determinant strategy (because it involves the process of setting a mathematical object called a determinant to zero).

Zero Determinant (ZD) strategies are a new class of probabilistic and conditional strategies that are able to unilaterally set the expected payoff of an opponent in iterated plays of the Prisoner’s Dilemma irrespective of the opponent’s strategy, or else to set the ratio be- tween a ZD player’s and their opponent’s expected payo ff. Here we show that while ZD strategies are weakly dominant, they are not evolutionarily stable and will instead evolve into less coercive strategies. We suggest that ZD strategies with an informational advantage over other players that allows them to recognize other ZD strategies will be evolutionarily stable (and able to exploit other players). However, such an advantage is bound to be short-lived as opposing strategies evolve to counteract the recognition.

Clearly, winning against your opponents isn’t everything if this impairs the payo ff against similar or identical strategies. But what if a strategy could recognize who they play against, and switch strategies depending on the nature of the opponent? For example, such a strategy would play ZD against others, but cooperate with other ZD strategists instead.

Riolo et al. designed a game where agents could donate costly resources only to
players that were su fficiently similar to them (given a tag). This was later abstracted into a model in which players can use diff erent payoff matrices (such as those for the Prisoner’s Dilemma or the Stag-Hunt game) depending on the tag of
the opponent. Recognizing another player’s identity can in principle be accomplished in two ways: the players can simply record an opponent’s tag and select a strategy accordingly, or they can try to recognize a strategy by probing the opponent with particular plays. When using tags, it is possible that players cheat by imitating the tag of the opponent (in that case it is necessary for players to agree
on a new tag so that they can continue to reliably recognize each other). Tag-based recognition is used to enhance cooperation among animals via the so-called green-beard” eff ect, and can give rise to cycles between mutualism and altruism. Recognizing a strategy from behavior is discussed below. Note that whether a player’s strategy is identifi ed by a tag or learned from interaction, in both cases
it is communication that enables cooperation.

For ZD strategists to stably and reliably outcompete other strategies, they have to have an informational advantage. This extra information” can be obtained by using a tag to recognize each other and conditionally cooperate or play ZD depending on this tag, or by having a longer-memory strategy that a player can use to prove the opponent’s play. Of course, such an information-based dominance is itself vulnerable to the evolution of interfering mechanisms by the exploited strategies, either by imitating the tag (and thus destroying the information channel) or by evolving longer memories themselves. Needless to say, this type of evolutionary arms race
has been, and will be, observed throughout the biosphere.