How is Monte Carlo Tree Search implemented in practice

Tags:

I understand, to a certain degree, how the algorithm works. What I don't fully understand is how the algorithm is actually implemented in practice.

I'm interested in understanding what optimal approaches would be for a fairly complex game (maybe chess). i.e. recursive approach? async? concurrent? parallel? distributed? data structures and/or database(s)?

-- What type of limits would we expect to see on a single machine? (could we run concurrently across many cores... gpu maybe?)

-- If each branch results in a completely new game being played, (this could reach the millions) how do we keep the overall system stable? & how can we reuse branches already played?

726

asked Mar 21 '18 02:03

rambossa

1 Answers

recursive approach? async? concurrent? parallel? distributed? data structures and/or database(s)

In MCTS, there's not much of a point in a recursive implementation (which is common in other tree search algorithms like the minimax-based ones), because you always go "through" a game in sequences from current game state (root node) till game states you choose to evaluate (terminal game states, unless you choose to go with a non-standard implementation using a depth limit on the play-out phase and a heuristic evaluation function). The much more obvious implementation using while loops is just fine.
If it's your first time implementing the algorithm, I'd recommend just going for a single-threaded implementation first. It is a relatively easy algorithm to parallelize though, there are multiple papers on that. You can simply run multiple simulations (where simulation = selection + expansion + playout + backpropagation) in parallel. You can try to make sure everything gets updated cleanly during backpropagation, but you can also simply decide to not use any locks / blocking etc. at all, there's already enough randomness in all the simulations anyway so if you lose information from a couple of simulations here and there due to naively-implemented parallelization it really doesn't hurt too much.
As for data structures, unlike algorithms like minimax, you actually do need to explicitly build a tree and store it in memory (it is built up gradually as the algorithm is running). So, you'll want a general tree data structure with Nodes that have a list of successor / child Nodes, and also a pointer back to the parent Node (required for backpropagation of simulation outcomes).

What type of limits would we expect to see on a single machine? (could we run concurrently across many cores... gpu maybe?)

Running across many cores can be done yes (see point about parallelization above). I don't see any part of the algorithm being particularly well-suited for GPU implementations (there are no large matrix multiplications or anything like that), so GPU is unlikely to be interesting.

If each branch results in a completely new game being played, (this could reach the millions) how do we keep the overall system stable? & how can we reuse branches already played?

In the most commonly-described implementation, the algorithm creates only one new node to store in memory per iteration/simulation in the expansion phase (the first node encountered after the Selection phase). All other game states generated in the play-out phase of the same simulation do not get any nodes to store in memory at all. This keeps memory usage in check, it means your tree only grows relatively slowly (at a rate of 1 node per simulation). It does mean you get slightly less re-usage of previously-simulated branches, because you don't store everything you see in memory. You can choose to implement a different strategy for the expansion phase (for example, create new nodes for all game states generated in the play-out phase). You'll have to carefully monitor memory usage if you do this though.

176

answered Sep 29 '22 21:09

Dennis Soemers

Related questions
                            
                                Approximation algorithm for TSP variant, fixed start and end anywhere but starting point + multiple visits at each vertex ALLOWED
                            
                                Why are game renderers so fast compared to 3d applications' renderers?
                            
                                Determining if an array has a k-majority element
                            
                                Delete max in O(1) and insert in O(logn)
                            
                                What are a posteriori and a priori analyses of algorithm operations?
                            
                                All natural numbers which sum up to N and where the inverses sum up to one
                            
                                Is there a variant of min_element which takes a projection function?
                            
                                How to get the nearest point outside a polygon from a point inside a polygon?
                            
                                Best dynamic data structure for 2d circle nearest neighbor
                            
                                Number of swaps in a permutation [duplicate]
                            
                                Data Structure for Quickly Checking Numbers against Overlapping Ranges
                            
                                What are the possible ways to encode a counter in a byte?
                            
                                Text Search based algorithm not behaving as intended
                            
                                Algorithm for clustering with minimum size constraints
                            
                                How to decode the IDAT chunk of a PNG file with no compression
                            
                                Finding nearest sumElement combination in list
                            
                                Big-O of log versus square root
                            
                                Algorithm implemented in ruby to add one to a number represented as an array
                            
                                How to perform cluster with weights/density in python? Something like kmeans with weights?
                            
                                Finding the most frequently occurring element in an SSE register

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How is Monte Carlo Tree Search implemented in practice

Tags:

algorithm

artificial-intelligence

simulation

montecarlo

monte-carlo-tree-search

rambossa

People also ask

1 Answers

Dennis Soemers

Recent Activity

Donate For Us