I understand, to a certain degree, how the algorithm works. What I don't fully understand is how the algorithm is actually implemented in practice.
I'm interested in understanding what optimal approaches would be for a fairly complex game (maybe chess). i.e. recursive approach? async? concurrent? parallel? distributed? data structures and/or database(s)?
-- What type of limits would we expect to see on a single machine? (could we run concurrently across many cores... gpu maybe?)
-- If each branch results in a completely new game being played, (this could reach the millions) how do we keep the overall system stable? & how can we reuse branches already played?
What is Monte Carlo Tree Search ? MCTS is an algorithm that figures out the best move out of a set of moves by Selecting → Expanding → Simulating → Updating the nodes in tree to find the final solution. This method is repeated until it reaches the solution and learns the policy of the game.
Monte Carlo Tree Search is a heuristic algorithm. MCTS can operate effectively without any knowledge in the particular domain, apart from the rules and end conditions, and can find its own moves and learn from them by playing random playouts.
For practical purposes, MCTS really should be considered to be a Model-Based method.
Monte Carlo Tree Search (MCTS) is a method for making optimal decisions in artificial intelligence (AI) problems, typically move planning in combinatorial games. It combines the generality of random simulation with the precision of tree search.
recursive approach? async? concurrent? parallel? distributed? data structures and/or database(s)
while
loops is just fine.minimax
, you actually do need to explicitly build a tree and store it in memory (it is built up gradually as the algorithm is running). So, you'll want a general tree data structure with Nodes
that have a list of successor / child Nodes
, and also a pointer back to the parent Node
(required for backpropagation of simulation outcomes).What type of limits would we expect to see on a single machine? (could we run concurrently across many cores... gpu maybe?)
Running across many cores can be done yes (see point about parallelization above). I don't see any part of the algorithm being particularly well-suited for GPU implementations (there are no large matrix multiplications or anything like that), so GPU is unlikely to be interesting.
If each branch results in a completely new game being played, (this could reach the millions) how do we keep the overall system stable? & how can we reuse branches already played?
In the most commonly-described implementation, the algorithm creates only one new node to store in memory per iteration/simulation in the expansion phase (the first node encountered after the Selection phase). All other game states generated in the play-out phase of the same simulation do not get any nodes to store in memory at all. This keeps memory usage in check, it means your tree only grows relatively slowly (at a rate of 1 node per simulation). It does mean you get slightly less re-usage of previously-simulated branches, because you don't store everything you see in memory. You can choose to implement a different strategy for the expansion phase (for example, create new nodes for all game states generated in the play-out phase). You'll have to carefully monitor memory usage if you do this though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With