"Stone-scissors-paper" and game theory

image

The game "stone-scissors-paper" is great in order to decide who will have to take out the garbage. But did you notice what happens when, instead of throwing out three, the game continues round after round? First, you choose a principle that gives you an advantage, but then the enemy quickly understands it and turns it in their favor. In the process of changing strategies, you gradually reach a point at which neither side can continue to improve. Why does this happen?

In the 1950s, mathematician John Nash proved that in any kind of game with a finite number of players and a finite number of options (such as “rock-paper-scissors”) there is always a mixture of strategies in which no player can show better results by changing only own strategy. The theory of such stable sets of strategies, which are called “ Nash equilibria, ” revolutionized the field of game theory, changed the direction of economic development and ways of studying and analyzing everything from political contracts to network traffic. She also allowed Nash to win the Nobel Prize in 1994 .

So what does Nash equilibrium look like in a rock-paper-scissors game? Let's simulate a situation in which you (Player A) and your opponent (Player B) are playing the game again and again. In each round, the winner gets a point, the loser loses a point, and a draw is counted as zero points.

Suppose Player B chose a (stupid) strategy of choice in each round of paper. After several rounds of victories, losses and draws, you will most likely notice his system and develop a winning counterstrategy, choosing scissors in each round. Let's call this set of strategies (scissors, paper). If, as a result of each round, scissors against paper are obtained, then you make your way to a perfect victory.

But Player B soon notices the short-sightedness of this set of strategies. Having seen that you choose scissors, he switches to the strategy of constantly choosing a stone. This set of strategies (scissors, stone) begins to win for Player B. But, of course, you will now turn to paper. During these stages of the game, Players A and B use what are called “pure” strategies — the only strategies chosen and implemented constantly.

Obviously, it is impossible to achieve equilibrium here: for every pure strategy, for example, “always choose a stone”, you can work out a counterstrategy, for example, “always choose paper”, which will force you to change the strategy again. You and your opponent will constantly pursue each other in a circle of strategies.

But you can also try a “mixed” strategy. Suppose that instead of choosing one strategy, you can randomly select one of the pure strategies for each round. Instead of “always choose a stone,” a mixed strategy may look like “in half the cases, choose a stone, in the other half choose scissors” Nash has proven that when such mixed strategies are permissible, each such game should have at least one equilibrium point. Let's find her.

What is a reasonable mixed strategy for paper-scissors? It intuitively seems reasonable that it is “to choose a stone, paper or scissors with equal probability”. This strategy is written as ( frac13, frac13, frac13). This means that the stone, scissors and paper are selected with probability  frac13. Is this strategy good?

Suppose your opponent’s strategy is "always choose a stone." This is a pure strategy, which can be described as (1,0,0). What will be the results of the game when recruiting strategies ( frac13, frac13, frac13)for Player A and (1,0,0)for Player B?

To get a clearer picture of the game, we will build a table in which the probabilities of each of the nine possible outcomes of each round will be shown: A stone, A; stone at A, paper at B; and so on. In the table below, the top row represents Player B, and the left column is Player A.

A | BTOBH
TO frac1300
B frac1300
H frac1300

Each element of the table indicates the probability of a pair of selected options for each round. It is simply the product of the probabilities that each player makes an appropriate choice. For example, the probability that Player A chooses a paper is equal to  frac13and the probability that Player B chooses a stone is equal to 1, that is, the probability (stone at A, stone at B) is equal to  frac13 times1= frac13. But the probability (paper is A, scissors is B) is  frac13 times0=0since the probability of Player B’s selection of scissors is zero.

How will Player A manifest itself in his set of strategies? Player A will win one third of the time (paper, stone), lose one third of the time (scissors, stone) and one third of the time draw (stone, stone). We can calculate the number of points Player A will receive on average in each round by calculating the sum of the product of each result and the corresponding probability:

 frac13(1)+ frac13(0)+ frac13(1)=0


Thus, on average, Player A will receive 0 points per round. You will win, lose and play in a tie with the same probability. On average, the number of victories and defeats will balance each other, and in fact, both players will come to a draw.

But as we have said, you can improve your results by changing your strategy, assuming that the enemy will not change your strategy. If you go to the strategy (0,1,0) (“choose paper each time”), the probability table will look like this:
A | BTOBH
TO0one0
B000
H000

In each round, you will wrap the opponent's stone in your paper and receive one point for each round.

That is, this pair of strategies - ( frac13, frac13, frac13)for A and (1,0,0)for B, it is not the Nash equilibrium: you, as Player A, can improve your results by changing the strategy.

As we have seen, pure strategies do not seem to lead to equilibrium. But what if your opponent tries to use a mixed strategy, for example ( frac12, frac14, frac14)? This is the strategy “in half of the cases we choose a stone; paper and scissors go for a quarter of cases. " This is how the probability table will look like:
A | BTOBH
TO frac16 frac112 frac112
B frac16 frac112 frac112
H frac16 frac112 frac112

Here is the table of "rewards" from the point of view of Player A; This is the number of points Player A has in each of the results.
A | BTOBH
TO0-oneone
Bone0-one
H-oneone0

Using multiplication, we combine the two tables to calculate the average points received by Player A for each round.

 frac16(0)+ frac112(1)+ frac112(1)+ frac16(1)+ frac112(0)+ frac112(1)+ frac16(1)+ frac112(1)+ frac112(0)=0


On average, Player A earns 0 points per round. As before, this set of strategies, ( frac13, frac13, frac13)for A and ( frac12, frac14, frac14)for B, as a result leads to a draw.

But as before, you, as Player A, can improve your results by changing the strategy: against the strategy of Player B ( frac12, frac14, frac14)Player A must choose ( frac14, frac12, frac14). Here is a probability table:

A | BTOBH
TO frac18 frac116 frac116
B frac14 frac18 frac18
H frac18 frac116 frac116

and here is the final result for A:

 frac18(0)+ frac116(1)+ frac116(1)+ frac14(1)+ frac18(0)+ frac18(1)+ frac18(1)+ frac116(1)+ frac116(0)= frac116


That is, this set of strategies - ( frac14, frac12, frac14)for A and ( frac12, frac14, frac14)for B - gives player A on average  frac116points per round. After 100 games, Player A will be ahead by 6.25 points. Player A has a big incentive to change strategy. That is a set of strategies ( frac13, frac13, frac13)for A and ( frac12, frac14, frac14)for B, too, is not a Nash equilibrium.

But now let's look at a couple of strategies. ( frac13, frac13, frac13)for A and ( frac13, frac13, frac13)for B. Here is the corresponding probability table:
A | BTOBH
TO frac19 frac19 frac19
B frac19 frac19 frac19
H frac19 frac19 frac19

Thanks to symmetry, we can quickly calculate the overall result:

 frac19(0)+ frac19(1)+ frac19(1)+ frac19(1)+ frac19(0)+ frac19(1)+ frac19(1)+ frac19(1)+ frac19(0)=0


And again you and your opponent have come to a draw. But the difference here is that no player has an incentive to change strategies! If Player B would move on to any unbalanced strategy, where one choice — say, a stone — was chosen more often than others, then Player A would simply change his strategy and choose paper more often. In the end, this would lead to a positive overall result of Player A in each round. This is exactly what happens when Player A chooses a strategy. ( frac14, frac12, frac14)vs. Player B's strategy ( frac12, frac14, frac14).

Of course, if Player A moves from ( frac13, frac13, frac13)to an unbalanced strategy, Player B will likewise be able to take advantage. Therefore, none of the players can improve their results only by changing their own strategy. The game has reached Nash equilibrium.

The fact proved by Nash that such games have similar equilibria is very important for several reasons. One of the reasons is that many real-life situations can be modeled as games. When a group of people is forced to choose between personal and collective benefits — for example, in negotiations or in the process of competing for common resources — you can see that strategies are used and winnings are evaluated. Nash's work had such a great influence, among other things, thanks to the omnipresent nature of this mathematical model.

Another reason is that the Nash equilibrium, in a sense, is a positive result for all players. When this balance is reached, none of the players can improve their results by changing their own strategy. There may be collective results that can be achieved when all players act in perfect collaboration, but if you can only control yourself, then the Nash equilibrium will be the best of the results you can achieve.

Therefore, it is hoped that “games” like economic stimulus packages, tax codes, contract terms and network designs will lead to Nash equilibria, in which individuals acting in their own interests will come to a result that satisfies everyone and the systems will become stable. But playing such games, will it be reasonable to assume that players will naturally come to Nash equilibrium?

There is a temptation to think so. In our game “rock-paper-scissors” we could immediately guess that none of the players could play better than playing completely randomly. But partly it happens because the preferences of all players are known to all other players: everyone knows how much each other will win and lose with each of the results. But what if preferences are more hidden and complex?

Imagine a new game in which Player B gets three points when he wins against scissors, and one point for any other victory. This will change the mixed strategy: Player B will more often choose a stone, hoping for a triple reward when Player A selects scissors. And although the difference in points does not directly affect the rewards of Player A, the resulting change in the strategy of Player B will lead to a new counterstrategy A.

And if each of Player B's rewards were different and hidden, then Player A would take some time to figure out Player B’s strategy. Many rounds must pass before Player A guesses, for example, how often Player B chooses a stone to understand how often he needs to choose paper.

Now imagine that 100 people play “stone-scissors-paper”, and each of them has a different set of secret rewards, each of which depends on how many of the 99 opponents they defeat with stone, scissors or paper. How long will it take to calculate just the correct frequency for choosing a stone, scissors or paper, which is necessary to reach a balance point? Most likely, a lot. Perhaps more than the game itself will last. Perhaps longer than the lifetime of the universe itself!

At least, it is not at all obvious that even absolutely rational and thoughtful players who choose good strategies and act in their own interests, as a result, will come to balance in the game. This thought underlies the article published online in 2016 . It proves that there is no common solution that in all games could lead at least to an approximate Nash equilibrium. This is not to say that ideal players never seek balance in games - they often really strive. It simply means that there is no reason to believe that if ideal players play the game, equilibrium will be achieved.

When we develop a transportation network, we can hope that all players, that is, drivers and pedestrians, each of whom seeks to find the fastest way home, collectively reach an equilibrium in which nothing can be won by choosing a different route. We can hope that the invisible hand of John Nash will direct them in such a way that their competitive and joint interests — choosing the shortest possible route while avoiding traffic jams — will create balance.

But our game of rock-paper-scissors with ever-increasing complexity shows that such hopes might not come true. An invisible hand can control some of these games, but other games resist it, luring players into the trap of endless competition for winning, which is constantly out of touch.

Exercises


  1. Suppose Player B is playing with a mixed strategy. ( frac12, frac12,0). Which mixed strategy should A choose in order to maximize the amount of his winnings in the long run?
  2. Suppose Player B is playing with a mixed strategy. ( frac16, frac26, frac36). Which mixed strategy should A choose in order to maximize the amount of his winnings in the long run?
  3. How can the dynamics of the game change if for a draw each player will be given a point?

Source: https://habr.com/ru/post/411523/


All Articles