Making decisions

Lecture



This section deals with games with two players, which will be referred to as MAX and MIN for reasons that will soon become apparent. The MAX player moves first, after which the players take turns in turns until the game ends. At the end of the game, the winning player is awarded points, and a penalty is imposed on the loser. A game can be formally defined as a sort of search task with the components described below.

  • The initial state , which includes the position on the board and determines the player who must walk.
  • The function of determining the successor , returning a list of pairs (move, state), each of which indicates the allowable move and the resulting state.
  • Checking the terminal state , which determines that the game is over. The states in which the game is finished are called terminal states.
  • A utility function (also called an objective or reward function) that reports the numeric value of the terminal states. In chess, the result is a win, loss or draw, with values ​​of +1, -1 or 0. Some games have a wider range of possible outcomes; for example, the number of points in backgammon can range from +192 to -192. This section focuses primarily on zero-sum games, although non-zero-sum games will also be briefly mentioned.

The initial state and allowable moves of each side determine the game tree for this game. From the initial state, the MAX player has nine possible moves. The moves alternate in such a way that max puts X, a MIN mark O, until the leaf nodes corresponding to the terminal states are reached, such that one player leaves three of his badges in one row or all the cells are filled.

The number under each leaf node indicates the utility value of the corresponding terminal state from the player's point of view max; it is assumed that high values ​​are favorable for the player MAX and unfavorable for the player MIN (which is why, in theory, these players are given such names). The MAX player’s task is to use a search tree (especially data on the usefulness of terminal states) to determine the best move.

Optimal strategists

When solving ordinary search problems, the optimal solution for the player MAX should be a sequence of moves leading to the goal - to the terminal state, which corresponds to the gain. On the other hand, the player MIN is also involved in the game, who has a different opinion on this. This means that the MAX player must find a reliable strategy to determine the MAH player’s move in the initial state, then the MAX player’s moves in the states resulting from any possible MIN player’s response, and then the MAX moves in the states that resulted from any possible MIN’s answer to moves, etc. Roughly speaking, the optimal strategy leads to a result that is at least as favorable as any other strategy, in those conditions when you have to play with an adversary who does not admit mistakes. First of all, let us consider how to find this optimal strategy, even though for MAX it will often be impossible to carry out its exhaustive computation in games that are more complicated than tic-tac-toe.

  Making decisions
(Partial) search tree for tic-tac-toe game. The top node represents the initial state, and the MAX player walks first, placing an X in an empty cell. This picture shows a part of the search tree in which alternating moves of players MIN (O icon) and MAX (X icon) are shown. The moves are performed until finally one of the terminal states is reached, which can be assigned utility data in accordance with the rules of the game.



















Even simple games such as tic-tac-toe are too complex to be able to bring the full game tree for them in this book, so let's move on to the description of the trivial game shown in the figure. Possible player codes MAX from the root node are labeled as ab2 and a3. Possible answers to ai for player MIN are b1, b2, b3, etc. This particular game ends after each player, MAX and MIN, makes one move each. (According to the terminology of game theory, this tree has a depth of one turn and consists of moves made by two participants, each of which is called a half-move.) The usefulness of the terminal states in this game ranges from 2 to 14.

  Making decisions
Tree game with two semi-moves. Nodes A are "MAX nodes" in which the turn of turn belongs to the MAX player, and the nodes are considered as "MIN nodes". Terminal nodes show utility values ​​for MAX; the remaining nodes are indicated by their minimax values. The best MAX player move from the root is a.1, because it leads to the successor with the highest minimax value, and the best answer MIN is hi, because it leads to the successor with the smallest minimax value













If there is a game tree, the optimal strategy can be determined by examining the minimax value of each node, which can be written as Minimax-Value (n). The minimax value of a node is the utility (for MAX) of staying in the corresponding state, provided that both players make moves in an optimal way from this node to the node that marks the end of the game. Obviously, the minimax value of the terminal state is simply its utility. Moreover, if there is a choice, the MAX player should prefer the move leading to the state with the maximum value, and the MIN player - preferring to the state with the minimum value. Therefore, the following relationship holds.

  Making decisions








Apply these definitions to the game tree shown in the figure. Terminal nodes at the lowest level are already indicated by numbers that indicate their usefulness. The first MIN node, denoted as B, has three successors with values ​​of 3, 12, and 8, so its minimax value is 3. Similarly, the other two MIN nodes have a minimax value of 2. The root node is the MAX node; his successors have minimax values ​​of 3, 2, and 2, so the root node itself has a minimax value of 3. You can also define the concept of a minimax decision made at the root of the tree: action a 1 is the optimal choice for the player max, since it leads to the successor with the highest minimax value .

In this definition of the optimal game for the MAX player, it is assumed that the MIN player also plays in an optimal way: he maximizes the result corresponding to the worst outcome of the game for the MAX. What would happen if the MIN player did not play optimally? In this case, you can easily show that the player MAX would achieve even more. There may be other strategies of the game against opponents, playing in a non-optimal way, which allow to achieve more than the minimax strategy; but these strategies are necessarily worse against opponents who play optimally.

Minimax algorithm

The minimax algorithm calculates the minimax solution from the current state. It uses a simple recursive calculation of the minimax values ​​of each successor state with the direct implementation of the governing equations. Recursion passes all levels up to the leaves of the tree, and then minimax values ​​are reserved throughout the tree as the recursion is reversed.

For example, in the tree shown in the figure, the algorithm first performed a recursion with going down to the three lower left nodes and applied the utility utility function to them to determine that the utility values ​​of these nodes are 3, 12, and 8, respectively. The algorithm then determines the minimum of these values, 3, and returns it as the reserved value of node B.

A similar process allows you to get the reserved values ​​of 2 for node C and 2 for node D. Finally, take the maximum of the values ​​3, 2 and 2 to get the reserved value 3 of the root node.


The algorithm for calculating minimax solutions. It returns the action corresponding to the best possible move, i.e. the action leading to the result with the best utility, in accordance with the assumption that the enemy is playing to minimize the utility. The Max-Value and Min-Value functions are used to pass through the entire tree of the game down to the leaves, which allows you to determine the reserved value of a certain state

function Minimax-Decision (state) returns action action
inputs: state, current game state
v <- Max-Value (state)
Return action action in the set of Successors (state) with the value v

function Max-Value (state) returns utility value
if Terminal-Test (state) then return Utility (state)
v <- ∞
for a, s in successors (state) do
v <- Max (v, Min-Value (s))
return v

function Min-Value (state) returns utility value
if Terminal-Test (state) then return Utility (state)
v <- ∞
for a, s in successors (state) do
v <- Min (v, Max-Value (s))
return v
This minimax algorithm performs a complete investigation of the depth game tree. If the maximum depth of the tree is equal to in and in each state there are b allowable moves, then the time complexity of this minimax algorithm is O (b m ). For the algorithm that forms all successors at once, the spatial complexity is O (bm), and for the algorithm that forms successors one by one, O (m). Of course, in real games such time costs are completely unacceptable, but this algorithm serves as the basis for the mathematical analysis of games, as well as more practical algorithms.

Optimal solutions in multi-player games Many popular games allow for more than two players. Let us consider how to extend the idea of ​​the minimax algorithm to games with several players. From the point of view of technical implementation, this is easy to do, but there are some new and interesting conceptual problems. First, you need to replace a single value for each node with a value vector. For example, in a game with three players, in which players A, B, and C participate, the vector is associated with each node. For terminal states, this vector defines the usefulness of this state from the point of view of each player. (In games with two players and a zero-sum, the two-element vector can be reduced to a single value, since the values ​​in it are always opposite.)

The simplest way to implement this approach is to use the Utility function, which returns a vector of utility values. Now it is necessary to consider non-terminal states. Consider a node in the game tree shown in the figure, which is designated as X. In this state, player C chooses what to do. Two options lead to terminal states with utility vectors and . Since 6 is more than 3, player C must choose the first move. This means that if state X is reached, the further game will lead to a terminal state with utilities . Therefore, the reserved value of X is this vector. Generally speaking, the reserved value of the node n is the utility vector of such a node, the successor, which has the highest value for
a player choosing a move at node n.

Anyone who plays multi-player games, such as Diplomacy ™ (Diplomacy), quickly finds out that there are a lot more things happening in them than in two-player games. In multi-player games, alliances are usually created between players, either formal or informal. In addition, sometimes as the game develops, alliances are then formed, then destroyed. How can you understand this behavior? Are alliances a natural consequence of choosing the best strategies for each player in a game with several players? As it turned out, they can really be such a consequence. For example, suppose that Ive players have weak positions, and player C has a stronger position.

In this case, both for A and B, it is often optimal to attack C, not each other, because otherwise C will destroy each of them separately. Thus, cooperation becomes the result of purely selfish behavior. Of course, as soon as player C weakens under the joint onslaught, the alliance will lose its meaning and the agreement may be violated by either player A or player B. In some cases, pronounced alliances simply become a concrete expression of what would have happened. And in other cases, an attempt to violate the alliance causes public condemnation, so players must put on the scales immediately achievable benefit from the violation of the alliance and the long-term loss arising from the fact that they will not be considered credible.
  Making decisions













If the game does not have a zero amount, then cooperation may also occur even if there are only two players. Assume that there is a terminal state with and that 1000 is the maximum possible utility for each player. In this case, the optimal strategy for both players is to do everything possible to achieve this state; This means that players will automatically cooperate to achieve the mutually desired goal.
created: 2014-09-23
updated: 2021-03-13
132518



Rating 9 of 10. count vote: 2
Are you satisfied?:



Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Mathematical methods of research operations. The theory of games and schedules.

Terms: Mathematical methods of research operations. The theory of games and schedules.