Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Monte Carlo Tree Searching UCT implementation

Tags:

Can you explain me how to build the tree?

I quite understood how the nodes are chosen, but a nicer explanation would really help me implementing this algorithm. I already have a board representing the game state, but I don't know (understand) how to generate the tree.

Can someone points me to a well commented implementation of the algorithm (I need to use it for AI)? Or better explanation/examples of it?

I didn't found a lot of resources on the net, this algorithm is rather new...

like image 831
Makers_F Avatar asked Jan 29 '12 20:01

Makers_F


People also ask

How does the Monte Carlo search tree work?

What is Monte Carlo Tree Search ? MCTS is an algorithm that figures out the best move out of a set of moves by Selecting → Expanding → Simulating → Updating the nodes in tree to find the final solution. This method is repeated until it reaches the solution and learns the policy of the game.

What is UCT in MCTS?

One particular selection mechanism that has proved to be reliable is based on the Upper Confidence Bounds for Trees, commonly referred as UCT. The UCT attempts to nicely balance exploration and exploitation by considering the values stored in the statistical tree of the MCTS.

Is Monte Carlo tree search better than Minimax?

Studies show that MCTS does not detect shallow traps, where opponents can win within a few moves, as well as minimax search. Thus, minimax search performs better than MCTS in games like Chess, which can end instantly (king is captured).

How is a leaf node determined in Monte Carlo tree search during the selection phase?

MCTS uses the Upper Confidence Bound (UCB) formula applied to trees as the strategy in the selection process to traverse the tree. It balances the exploration-exploitation trade-off. During tree traversal, a node is selected based on some parameters that return the maximum value.


1 Answers

The best way to generate the tree is a series of random playouts. The trick is being able to balance between exploration and exploitation (this is where the UCT comes in). There are some good code samples and plenty of research paper references here : https://web.archive.org/web/20160308043415/http://mcts.ai:80/index.html

When I implemented the algorithm, I used random playouts until I hit an end point or termination state. I had a static evaluation function that would calculate the payoff at this point, then the score from this point is propagated back up the tree. Each player or "team" assumes that the other team will play the best move for themselves, and the worst move possible for their opponent.

I would also recommend checking out the papers by Chaslot and his phd thesis as well as some of the research that references his work (basically all the MCTS work since then).


For example: Player 1's first move could simulate 10 moves into the future alternating between player 1 moves and player 2 moves. Each time you must assume that the opposing player will try to minimize your score whilst maximizing their own score. There is an entire field based on this known as Game Theory. Once you simulate to the end of 10 games, you iterate from the start point again (because there is no point only simulating one set of decisions). Each of these branches of the tree must be scored where the score is propagated up the tree and the score represents the best possible payoff for the player doing the simulating assuming that the other player is also choosing the best moves for themselves.

MCTS consists of four strategic steps, repeated as long as there is time left. The steps are as follows.

  1. In the selection step the tree is traversed from the root node until we reach a node E, where we select a position that is not added to the tree yet.

  2. Next, during the play-out step moves are played in self-play until the end of the game is reached. The result R of this “simulated” game is +1 in case of a win for Black (the first player in LOA), 0 in case of a draw, and −1 in case of a win for White.

  3. Subsequently, in the expansion step children of E are added to the tree.

  4. Finally, R is propagated back along the path from E to the root node in the backpropagation step. When time is up, the move played by the program is the child of the root with the highest value. (This example is taken from this paper - PDF

www.ru.is/faculty/yngvi/pdf/WinandsBS08.pdf

Here are some implementations:

A list of libraries and games using some MCTS implementations http://senseis.xmp.net/?MonteCarloTreeSearch

and a game independent open source UCT MCTS library called Fuego http://fuego.sourceforge.net/fuego-doc-1.1/smartgame-doc/group__sguctgroup.html

like image 199
danielbeard Avatar answered Oct 22 '22 21:10

danielbeard