Writing a tic-tac-toe solver using minimax

Neel Somani - September 6, 2017

In this post, we’ll build a tic-tac-toe solver using the minimax algorithm. There are a few steps. First, we’ll need to generate a game tree of all possible moves and outcomes. Then, we’ll write the minimax code to calculate the optimal move.

Here’s a link to the GitHub repo. Note: The code in the GitHub repo is slightly refactored, but the logic is all the same.

1. Starter code

Because the focus of this post isn’t really about creating the tic-tac-toe game itself, I’ve included some starter code below. We’re going to create a class called GameTree, which will represent a game state: the current state of the board, and whose move it is. The GameTree will also store all child GameTrees, which are defined as any game state that is reachable within the opponent’s next move.

We’ll internally represent a board as a string of 9 characters, consisting of X, O, and spaces. The string will represent the board flattened. (e.g., The string "XXO XO X" would represent the board with the top row as "X X O", the next row as "_ X O", and the bottom row as "_ _ X".)

Before representing the game tree with a class, I was initially concerned about performance. I remember that I previously ran into issues when using a binary search tree class in Python with a few hundred million elements. But a loose upper bound on the number of tic-tac-toe boards is 3^9 (3 possible values for each square: X, O, and space) = 19683, and it’s loose since not all of those formations are valid (e.g., you can’t have three X’s in a row and three O’s in a row on the same board, the number of X's and O's have to be within one, etc.). Python should be able to handle 20000 elements easily.

class GameTree:
    # A list to switch between players
    players = ["X", "O"]

    def __init__(self, value="         ", player_number=0):
        self.value = value
        self.children = []
        self.player_number = player_number
        # TODO: Implement self.generate_children().
        self.generate_children()

    def print_tree(self):
        """ Print in a user-friendly way. """
        t = self.value
        print(t[0] + " | " + t[1] + " | " + t[2])
        print("---------")
        print(t[3] + " | " + t[4] + " | " + t[5])
        print("---------")
        print(t[6] + " | " + t[7] + " | " + t[8])

I’ve left a TODO for the self.generate_children() function, which we’ll implement in the next step.

2. Generating the game tree

The self.generate_children() function should reassign the self.children variable to a list of child GameTrees. Before we write that function, though, we need to know whether a game state is terminal, that is, whether either player has won, or whether the board is full.

    def is_win(self, player_number):
        """ Determine if the player has won the game. """
        player = GameTree.players[player_number]
        t = self.value
        return any([(t[3 * i] == player and t[3 * i + 1] == player and t[3 * i + 2] == player) for i in range(3)]) or \
            any([(t[i] == player and t[i + 3] == player and t[i + 6] == player) for i in range(3)]) or \
            (t[0] == player and t[4] == player and t[8] == player) or (t[2] == player and t[4] == player and t[6] == player)

To see whether the board is full, we’ll check for the presence of " " in the board string.

To generate all child trees, we’ll get all of the indices of the current tree.value string that are spaces (which represent available moves), and generate the trees corresponding to the player placing their symbol in each location.

    def generate_children(self):
        """ Generate the children of this node. """
        if self.is_win(self.player_number) or self.is_win((self.player_number + 1) % 2) or " " not in self.value:
            return
        children_indices = [position[0] for position in enumerate(self.value) if position[1] == " "]
        child_trees = []
        for i in children_indices:
            child_tree = GameTree(self.value[:i] + GameTree.players[self.player_number] + self.value[i+1:], (self.player_number + 1) % 2)
            child_trees.append(child_tree)
        self.children = child_trees

3. The minimax code

Let’s write a depth-limited minimax algorithm. We’ll let the user specify how deep into the minimax tree we’ll traverse. If that’s the case, then we’ll need some sort of heuristic to evaluate a game state if it’s not terminal. (If a game state is terminal, we can just value it as 1, 1/2, or 0 in the cases of winning, tying, and losing respectively.)

Let’s use the "probability of winning" as the heuristic, where the probability of winning is defined as the weighted proportion of descendent leaves that are winning states (valued at 1), tied states (valued at 1/2) and losing states (valued at 0). For example, if there are three possible child moves in a position, one of which is a win, one is a tie, and one is a loss, then we’ll value the game state as (1/3) * (1 + 1/2 + 0) = 1/2.

Here’s our heuristic function:

    def probability_of_winning(self, player_number):
        """ Give the probability of player winning this game. """
        if self.is_win(player_number):
            return 1
        elif self.is_win((self.player_number + 1) % 2):
            return 0
        elif " " not in self.value:
            return .5
        return sum([child.probability_of_winning(player_number) for child in self.children]) / len(self.children)

And now we can write the minimax algorithm with an arbitrary depth. We’ll decrement the depth each time we hit a minimizer node.

    def minimax(self, depth=2):
        """ Run minimax with the current player as max. Decrement the depth for each
        minimizer. Return the appropriate child tree. """

        def maximize(tree, depth):
            # Return the maximum of all child minimizer values.
            if len(tree.children) == 0:
                return (tree.value, tree.probability_of_winning(self.player_number))
            child_values = [minimize(child, depth - 1) for child in tree.children]
            return (tree.value, max(child_values, key=lambda k: k[1])[1])

        def minimize(tree, depth):
            # Return the minimum of all child maximizer values.
            if depth == 0 or len(tree.children) == 0:
                return (tree.value, tree.probability_of_winning(self.player_number))
            child_values = [maximize(child, depth) for child in tree.children]
            return (tree.value, min(child_values, key=lambda k: k[1])[1])

        child_values = [minimize(child, depth) for child in self.children]
        child_index = max(enumerate(child_values), key=lambda k: k[1][1])[0]
        return self.children[child_index]

And we’re done. We can see the algorithm play against itself with the following function:

    def run_minimax_game(self, depth=2):
        """ Automatically execute a minimax game. """
        tree = self
        move = 0
        while not tree.is_win(self.player_number) and not tree.is_win((self.player_number + 1) % 2) and " " in tree.value:
            print("Move number: " + str(move))
            tree.print_tree()
            tree = tree.minimax(depth)
            move += 1
        print("Move number: " + str(move))
        tree.print_tree()

4. The probability heuristic

Is the probability heuristic alone really that much worse than minimax? In short, yes. Try the following sample game tree.

sample = GameTree('OXO X   X', 1)
sample.print_tree()

In this position, it’s O to move, and the optimal move is clearly to place in the middle column of the bottom row, to prevent X from winning. Let’s write a helper function to get the "optimal" move that the probability heuristic identifies.

    def move_by_probability(self):
        """ Return the move that maximizes the probability of winning. """
        children_with_probabilities = [(child, child.probability_of_winning(self.player_number)) for child in self.children]
        return max(children_with_probabilities, key=lambda k: k[1])[0]

If you run sample.move_by_probability.print_tree(), you’ll find that O plays an entirely suboptimal move — a move that does maximize the proportion of situations in which O wins (if the moves were random), but doesn’t prevent X from winning if X plays optimally.

1. Starter code

2. Generating the game tree

3. The minimax code

4. The probability heuristic

About the Author