Zero sum games & state space search: Tic-Tac-Toe

In my previous post I covered using search to solve puzzles by abstracting the state space into a tree structure and using various graph searching algorithms to find a solution. In this post I will be following up this concept by applying its use to adversarial zero-sum games – that is, a turn-based game where you compete against another entity, be it another person or computer. In a zero-sum game each move a player makes should both maximize the detriment to their opponent while also minimizing the detriment to themselves. In this way, the net benefit to both players adds up to zero. Some examples of this kind of game are chess, poker, checkers, connect four, and go.

The focus of this post will be on a creating a computer-based adversary that makes intelligent choices for which move to use next – an artificial intelligence. To do this we will be implementing what is one of, if not THE simplest zero-sum games: Tic-Tac-Toe, sometimes called X’s & O’s.

Tic-Tac-Toe

Tic tac toe is played on a 3×3 grid of empty spaces, with players placing their symbol in one of the boxes each turn. The goal is to get three of your symbols in a straight line – vertical, horizontal, or diagonal – before your opponent can.

FINALLY! The Secrets To Winning Tic Tac Toe REVEALED! - Wise DIY | Wise DIY
A winning placement

Before we can start working on the tree search portion of the game, we should first implement the basics of the game with a dumb-strategy adversary as a basic framework from which to develop the more intelligent AI player.

A non-intelligent AI Implementation

Since the game is played on a grid, we will be using a vector of vectors to represent the board. When a player attempts to make a move, we need to validate that where they want to place their symbol is a valid move: obviously, it must fall in the grid – in a space that is not already occupied.

typedef vector<vector<int>> Board;

Board newBoard() {
    Board board = {{0,0,0},{0,0,0},{0,0,0}};
    return board;
}

bool validate(Board& b, int r, int c) {
    if (r >= 0 && r < 3 && c >= 0 && c < 3) {
        if (b[r][c] == 0)
            return true;
    }
    return false;
}

Because the focus of this post is on creating intelligent computer based opponents, I’m keeping this a simple console based game. That being said, when it is the players turn we require them to enter the row/column of the space they want, and then print the new board configuration.

 void printBoard(Board& board) {
    cout<<"______\n";
    for (int i = 0; i < 3; i++) {
        cout<<"|";
        for (int j = 0; j < 3; j++) {
            if (board[i][j] == 1)
                cout<<"O|";
            else if (board[i][j] == 2) {
                cout<<"X|";
            } else {
                cout<<" |";
            }
        }
        cout<<"\n______\n";
    }
}

void player_take_turn(Board& board) {
    int row = 0, col = 0;
    cout<<"row: ";
    cin>>row;
    cout<<"col: ";
    cin>>col;
    makeMove(board, row, col, PLAYER);
    printBoard(board);
}

I used integers to represent the board with 0 being a blank space, 1 being player one, and 2 being player two. You could just as easily use characters. Our first iteration of the computer player is going to be very simple: It will generate a random board position, if the space is not occupied it chooses that position, otherwise it keeps doing this until it finds a valid position. This leads to a very dump opponent that is very easy to beat. Of course, we also need a way to check if the most recent move was a winning move.

std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<int> dist(0, 2);
void computer_take_turn(Board& board) {
    int r = 0, c = 0;
    do {
        r = dist(gen);
        c = dist(gen);
    } while (board[r][c] != 0);
    makeMove(board, r, c, COMPUTER);
    printBoard(board);
}

int findWinner(Board& board) {
    int winner = 0;
    //check all vertical and horizontal winning states
    for (int i = 0; i < 3; i++) {
        if (board[i][0] == 1 && board[i][1] == 1 && board[i][2] == 1) {winner = 1; break;}
        if (board[i][0] == 2 && board[i][1] == 2 && board[i][2] == 2) {winner = 2; break;}
        if (board[0][i] == 1 && board[1][i] == 1 && board[2][i] == 1) {winner = 1; break;}
        if (board[0][i] == 2 && board[1][i] == 2 && board[2][i] == 2) {winner = 2; break;}
    }
    //check diagonal winning states
    if (board[0][0] == 1 && board[1][1] == 1 && board[2][2] == 1) winner = 1;
    if (board[0][0] == 2 && board[1][1] == 2 && board[2][2] == 2) winner = 2;  
    if (board[0][2] == 1 && board[1][1] == 1 && board[2][0] == 1) winner = 1;
    if (board[0][2] == 2 && board[1][1] == 2 && board[2][0] == 2) winner = 2;
    return winner;
}

int evaluateBoard(Board& board, int player) {
    if (findWinner(board) == player) {
        cout<<"Game Over!\n";
        gameOver = true;
        if (player == PLAYER) cout<<"Player wins!\n";
        else cout<<"Computer Wins!\n";
        return 1000;
    }
    return 0;
}

Being a turn based game, we need an event loop to track and control both the state of the game (game on vs game over, player turn, etc).

 //states & constants
const int PLAYER = 1;
const int COMPUTER = 2;
bool gameOver = false;
bool playerTurn = true;

void gameLoop() {
    Board board = newBoard();
    gameOver = false;
    while (!gameOver) {
        if (playerTurn) {
            player_take_turn(board);
            playerTurn = false;
            evaluateBoard(board, PLAYER);
            if (gameOver) break;
        } else {
            computer_take_turn(board);
            playerTurn = true;
            evaluateBoard(board, COMPUTER);
        }
    }
}

Ok, now we technically have a complete game of tic-tac-toe, all be it one that has an opponent AI with a strategy on par with a 3-year-olds.

 row: 0 
col: 2

| | |O|
| | |O|
| |X| |


| |X|O|
| | |O|
| |X| |

row: 2
col: 2

| |X|O|
| | |O|
| |X|O|

Game Over!
Player wins!

The above is the end game of a match played against our current artificial “intelligence”. As you can see, it would have been simple to block my win but because it is not using any intelligent strategy and just placing pieces randomly, it allowed me an easy win. Let’s try to put a couple braincells into our AI.

Creating an Opponent that “thinks” Before it Acts

Just as we did for the N-Queens Problem and tile sliding puzzles, we will be treating each board configuration as it’s state, with each possible next position representing state change – par for the course when it comes to search problems. Unlike those problems however, this time we are not only interested in local optimal positions: we are also interested in placing our opponent in the least optimal position possible.

John von Neumann, the man who brought us the current model of computation and such gems as the mergesort algorithm also came up with what’s called the minimax theorem in the late 1920’s, launching the modern-day field of game theory. The theorem gives name to the minimax algorithm which is the de-facto starting point for implementing all other zero-sum game playing AI’s.

The Minimax Algorithm

Minimax works on the principle of their being a maximizing player, and a minimizing player. That is, when one opponent is on the offense, one is on the defense, and their actions are dependent.

Minimax makes use of backtracking depth first search in order to find the series of steps to produce the desired result. Unlike regular backtracking dfs however, minimax does two searches at each recursive call. This is because minimax is not only concerned with the best score for the player, but also the worst possible for the opponent, as that may turn out to be the better choice. The recursive calls are made in such a way that they form a minimax tree where each level of the tree alternates between player and opponent.

It’s one of those algorithms that is tricky to describe, but the code is easy to understand, so lets take a peak at some pseudocode to make things a bit more clear.

procedure minimax(Node, Maximizer) {
score := evalute_node(Node);
if (score is WIN OR LOSE OR DRAW)
return score

if (Maximizer) {
val = INF;
for (all children of Node) {
val = min(val, minimax(child, false);
}
return val;
} else {
val = -INF;
for (all children of Node) {
val = max(val, minimax(child, true);
}
return val;
}
}

On each iteration, the algorithm scores the board. We are interested in obtaining three particular results: maximizer wins, maximzer loses, or a draw. This means if the game is not yet at an end state, the recursion continues, with the roles of the maximizer reversing on each new level of recursion: this is what builds the turn-ordered minimax tree.

Implementing a Tic-Tac-Toe Playing AI Using Minimax

The actual search is straight forward to implement as can be seen from the pseudocode. The relation to backtracking is obvious:

int minimax(Board& board, int player) {
    int score = evaluateBoard(board);
    if (score == WIN || score == LOSE)
        return score;
    if (countBlanks(board) == 0)
        return 0;
    if (player == COMPUTER) {
        int value = -31337;
        for (int i = 0; i < 3; i++) {
            for (int j = 0; j < 3; j++) {
                if (board[i][j] == 0) {
                    board[i][j] = player;
                    value = max(value, minimax(board, PLAYER));
                    board[i][j] = 0;
                }
            }
        }
        return value;
    } else {
        int value = 31337;
        for (int i = 0; i < 3; i++) {
            for (int j = 0; j < 3; j++) {
                if (board[i][j] == 0) {
                    board[i][j] = player;
                    value = min(value, minimax(board, COMPUTER));
                    board[i][j] = 0;
                }
            }
        }
        return value;
    }
}

The search simply tries every possible combinations of moves until it reaches an end state, backtracking to try the next combination of moves. During this process it keeps track of the move which leads to the max points for the maximizer, and which move leads to minimum points to their opponent.

The computer_take_turn() method runs the minimax search for every possible next board from the current, this is similar to the inner loop of minimax, using the same backtracking technique.

 
struct Move {
    int row;
    int col;
    int val;
};

void computer_take_turn(Board& board) {
    Move best = {-1,-1,-1000};
    int r = 0, c = 0;
    int val = 0;
    for (int i = 0; i < 3; i++) {
        for (int j = 0; j < 3; j++) {
            if (board[i][j] == 0) {
                board[i][j] = COMPUTER;
                val = minimax(board, COMPUTER);
   
                cout<<"Minimax val: "<<val<<"\n";
                if (val > best.val) {
                    best.val = val;
                    best.row = i;
                    best.col = j;
                }
                board[i][j] = 0;
            }
        }
    }
    if (best.val == -1000) {
        cout<<"Can't make a move.\n";
        gameOver = true;
        return;
    } else {
        cout<<"Best Move: "<<best.row<<"/"<<best.col<<"\n";
    }
    makeMove(board, best.row, best.col, COMPUTER);
    printBoard(board);
}

And that’s it! We now have an opponent that will always choose the optimal next move.

Closing Remarks

It should be mentioned this strategy while being better, is not THAT much better than the randomized “just try whatever” approach shown first. However this statement pretty much ONLY applies to when minimax is used for tic-tac-toe. It’s the nature of tic-tac-toe, not the algorithm that this statement applies.

When using minimax to build an AI for a game like connect four or chess where the search space is HUGE, and the game requires knowledge and strategy the results are much more impressive.

Minimax is the basis for more advanced techniques such as alpha beta pruning. There is also Maximin and Negamax which use variations on the same technique as Minimax.