Edit: Uploded the full source code if you want to see if you can get the AI to perform better: https://www.dropbox.com/s/ous72hidygbnqv6/MCTS_TTT.rar Edit: The search space is searched and moves resulting in losses are found. But moves resulting in losses are not visited very often due to the UCT algorithm. To learn about MCTS (Monte Carlo Tree Search) I've used the algorithm to make an AI for the classic game of tic-tac-toe. I have implemented the algorithm using the following design: <img src="https://i.stack.imgur.com/EieiQ.png" alt="MCTS stages"> The tree policy is based on UCT and the default policy is to perform random moves until the game ends. What I have observed with my implementation is that the computer sometimes makes errorneous moves because it fails to "see" that a particular move will result in a loss directly. For instance: <img src="https://i.stack.imgur.com/qxgJQ.png" alt="Tic Tac Toe example"> Notice how the action 6 (red square) is valued slightly higher than the blue square and therefore the computer marks this spot. I think this is because the game policy is based on random moves and therefore a good chance exist that the human will not put a "2" in the blue box. And if the player does not put a 2 in the blue box, the computer is gaurenteed a win. My Questions 1) Is this a known issue with MCTS or is it a result of a failed implementation? 2) What could be possible solutions? I'm thinking about confining the moves in the selection phase but I'm not sure :-) The code for the core MCTS: <pre class="prettyprint"><code> //THE EXECUTING FUNCTION public unsafe byte GetBestMove(Game game, int player, TreeView tv) { //Setup root and initial variables Node root = new Node(null, 0, Opponent(player)); int startPlayer = player; helper.CopyBytes(root.state, game.board); //four phases: descent, roll-out, update and growth done iteratively X times //----------------------------------------------------------------------------------------------------- for (int iteration = 0; iteration < 1000; iteration++) { Node current = Selection(root, game); int value = Rollout(current, game, startPlayer); Update(current, value); } //Restore game state and return move with highest value helper.CopyBytes(game.board, root.state); //Draw tree DrawTree(tv, root); //return root.children.Aggregate((i1, i2) => i1.visits > i2.visits ? i1 : i2).action; return BestChildUCB(root, 0).action; } //#1. Select a node if 1: we have more valid feasible moves or 2: it is terminal public Node Selection(Node current, Game game) { while (!game.IsTerminal(current.state)) { List<byte> validMoves = game.GetValidMoves(current.state); if (validMoves.Count > current.children.Count) return Expand(current, game); else current = BestChildUCB(current, 1.44); } return current; } //#1. Helper public Node BestChildUCB(Node current, double C) { Node bestChild = null; double best = double.NegativeInfinity; foreach (Node child in current.children) { double UCB1 = ((double)child.value / (double)child.visits) + C * Math.Sqrt((2.0 * Math.Log((double)current.visits)) / (double)child.visits); if (UCB1 > best) { bestChild = child; best = UCB1; } } return bestChild; } //#2. Expand a node by creating a new move and returning the node public Node Expand(Node current, Game game) { //Copy current state to the game helper.CopyBytes(game.board, current.state); List<byte> validMoves = game.GetValidMoves(current.state); for (int i = 0; i < validMoves.Count; i++) { //We already have evaluated this move if (current.children.Exists(a => a.action == validMoves[i])) continue; int playerActing = Opponent(current.PlayerTookAction); Node node = new Node(current, validMoves[i], playerActing); current.children.Add(node); //Do the move in the game and save it to the child node game.Mark(playerActing, validMoves[i]); helper.CopyBytes(node.state, game.board); //Return to the previous game state helper.CopyBytes(game.board, current.state); return node; } throw new Exception("Error"); } //#3. Roll-out. Simulate a game with a given policy and return the value public int Rollout(Node current, Game game, int startPlayer) { Random r = new Random(1337); helper.CopyBytes(game.board, current.state); int player = Opponent(current.PlayerTookAction); //Do the policy until a winner is found for the first (change?) node added while (game.GetWinner() == 0) { //Random List<byte> moves = game.GetValidMoves(); byte move = moves[r.Next(0, moves.Count)]; game.Mark(player, move); player = Opponent(player); } if (game.GetWinner() == startPlayer) return 1; return 0; } //#4. Update public unsafe void Update(Node current, int value) { do { current.visits++; current.value += value; current = current.parent; } while (current != null); } </code></pre>

Ok, I solved the problem by adding the code: <pre class="prettyprint"><code> //If this move is terminal and the opponent wins, this means we have //previously made a move where the opponent can always find a move to win.. not good if (game.GetWinner() == Opponent(startPlayer)) { current.parent.value = int.MinValue; return 0; } </code></pre> I think the problem was that the search space was too small. This ensures that even if selection does select a move that is actually terminal, this move is never chosen and resource are used to explore other moves instead :). Now the AI vs AI always plays tie and the Ai is impossible to beat as human :-)

Monte Carlo Tree Search: Implementation for Tic-Tac-Toe

Tags:

c#

algorithm

artificial-intelligence

tic-tac-toe

montecarlo

Edit: Uploded the full source code if you want to see if you can get the AI to perform better: https://www.dropbox.com/s/ous72hidygbnqv6/MCTS_TTT.rar

Edit: The search space is searched and moves resulting in losses are found. But moves resulting in losses are not visited very often due to the UCT algorithm.

To learn about MCTS (Monte Carlo Tree Search) I've used the algorithm to make an AI for the classic game of tic-tac-toe. I have implemented the algorithm using the following design:

MCTS stages The tree policy is based on UCT and the default policy is to perform random moves until the game ends. What I have observed with my implementation is that the computer sometimes makes errorneous moves because it fails to "see" that a particular move will result in a loss directly.

For instance: Tic Tac Toe example Notice how the action 6 (red square) is valued slightly higher than the blue square and therefore the computer marks this spot. I think this is because the game policy is based on random moves and therefore a good chance exist that the human will not put a "2" in the blue box. And if the player does not put a 2 in the blue box, the computer is gaurenteed a win.

My Questions

1) Is this a known issue with MCTS or is it a result of a failed implementation?

2) What could be possible solutions? I'm thinking about confining the moves in the selection phase but I'm not sure :-)

The code for the core MCTS:

    //THE EXECUTING FUNCTION
    public unsafe byte GetBestMove(Game game, int player, TreeView tv)
    {

        //Setup root and initial variables
        Node root = new Node(null, 0, Opponent(player));
        int startPlayer = player;

        helper.CopyBytes(root.state, game.board);

        //four phases: descent, roll-out, update and growth done iteratively X times
        //-----------------------------------------------------------------------------------------------------
        for (int iteration = 0; iteration < 1000; iteration++)
        {
            Node current = Selection(root, game);
            int value = Rollout(current, game, startPlayer);
            Update(current, value);
        }

        //Restore game state and return move with highest value
        helper.CopyBytes(game.board, root.state);

        //Draw tree
        DrawTree(tv, root);

        //return root.children.Aggregate((i1, i2) => i1.visits > i2.visits ? i1 : i2).action;
        return BestChildUCB(root, 0).action;
    }

    //#1. Select a node if 1: we have more valid feasible moves or 2: it is terminal 
    public Node Selection(Node current, Game game)
    {
        while (!game.IsTerminal(current.state))
        {
            List<byte> validMoves = game.GetValidMoves(current.state);

            if (validMoves.Count > current.children.Count)
                return Expand(current, game);
            else
                current = BestChildUCB(current, 1.44);
        }

        return current;
    }

    //#1. Helper
    public Node BestChildUCB(Node current, double C)
    {
        Node bestChild = null;
        double best = double.NegativeInfinity;

        foreach (Node child in current.children)
        {
            double UCB1 = ((double)child.value / (double)child.visits) + C * Math.Sqrt((2.0 * Math.Log((double)current.visits)) / (double)child.visits);

            if (UCB1 > best)
            {
                bestChild = child;
                best = UCB1;
            }
        }

        return bestChild;
    }

    //#2. Expand a node by creating a new move and returning the node
    public Node Expand(Node current, Game game)
    {
        //Copy current state to the game
        helper.CopyBytes(game.board, current.state);

        List<byte> validMoves = game.GetValidMoves(current.state);

        for (int i = 0; i < validMoves.Count; i++)
        {
            //We already have evaluated this move
            if (current.children.Exists(a => a.action == validMoves[i]))
                continue;

            int playerActing = Opponent(current.PlayerTookAction);

            Node node = new Node(current, validMoves[i], playerActing);
            current.children.Add(node);

            //Do the move in the game and save it to the child node
            game.Mark(playerActing, validMoves[i]);
            helper.CopyBytes(node.state, game.board);

            //Return to the previous game state
            helper.CopyBytes(game.board, current.state);

            return node;
        }

        throw new Exception("Error");
    }

    //#3. Roll-out. Simulate a game with a given policy and return the value
    public int Rollout(Node current, Game game, int startPlayer)
    {
        Random r = new Random(1337);
        helper.CopyBytes(game.board, current.state);
        int player = Opponent(current.PlayerTookAction);

        //Do the policy until a winner is found for the first (change?) node added
        while (game.GetWinner() == 0)
        {
            //Random
            List<byte> moves = game.GetValidMoves();
            byte move = moves[r.Next(0, moves.Count)];
            game.Mark(player, move);
            player = Opponent(player);
        }

        if (game.GetWinner() == startPlayer)
            return 1;

        return 0;
    }

    //#4. Update
    public unsafe void Update(Node current, int value)
    {
        do
        {
            current.visits++;
            current.value += value;
            current = current.parent;
        }
        while (current != null);
    }

624

asked May 22 '14 09:05

MortenGR

2 Answers

I think your answer shouldn't be marked as accepted. For Tic-Tac-Toe the search space is relatively small and optimal action should be found within a reasonable number of iterations.

It looks like your update function (backpropagation) adds the same amount of reward to nodes at different tree levels. This is not correct, since states current players are different at different tree levels.

I suggest you take a look at backpropagation in the UCT method from this example: http://mcts.ai/code/python.html

You should update node's total reward based on the reward calculated by previous player at specific level (node.playerJustMoved in the example).

155

answered Sep 29 '22 10:09

inejc

Ok, I solved the problem by adding the code:

        //If this move is terminal and the opponent wins, this means we have 
        //previously made a move where the opponent can always find a move to win.. not good
        if (game.GetWinner() == Opponent(startPlayer))
        {
            current.parent.value = int.MinValue;
            return 0;
        }

I think the problem was that the search space was too small. This ensures that even if selection does select a move that is actually terminal, this move is never chosen and resource are used to explore other moves instead :).

Now the AI vs AI always plays tie and the Ai is impossible to beat as human :-)

answered Sep 29 '22 10:09

MortenGR

Related questions
                            
                                Unpredictible behaviour in c# dynamic
                            
                                How to force hangfire server to remove old server data for that particular server on restart?
                            
                                Working way to make video from images in C#
                            
                                Visual Studio Designer is always trying to change my control
                            
                                Dependency Injection Frameworks: Why do I care?
                            
                                Method overload resolution unexpected behavior
                            
                                What happens when user click .NET assembly (EXE)?
                            
                                Filter the "Includes" table on Entity Framework query
                            
                                What is the correct way to use TypeForwardedToAttribute?
                            
                                What could be causing a "Cannot access a disposed object" error in WCF?
                            
                                Value types inherit from System.Object...why? [duplicate]
                            
                                Parallel.Foreach exceptions and cancel
                            
                                jQuery Infinite Scroll and Gridview
                            
                                C# convert RGB value to CMYK using an ICC profile?
                            
                                Reading Group Policy Settings using C#
                            
                                Is there any way to detect a mouseclick outside a user control?
                            
                                How to use eclipse for C# development? [duplicate]
                            
                                WCF REST, streamed upload of files and httpRuntime maxRequestLength property
                            
                                What is the correct use of IDatabaseInitializer in EF?
                            
                                Find USB drive letter from VID/PID (Needed for XP and higher)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Monte Carlo Tree Search: Implementation for Tic-Tac-Toe

Tags:

c#

algorithm

artificial-intelligence

tic-tac-toe

montecarlo

MortenGR

People also ask

2 Answers

inejc

MortenGR

Recent Activity

Donate For Us