I had a small but potentially stupid question about Monte Carlo Tree Search. I understand most of it but have been looking at some implementations and noticed that after the MCTS is run for a given state and a best move returned, the tree is thrown away. So for the next move, we have to run MCTS from scratch on this new state to get the next best position.
I was just wondering why we don't retain some of the information from the old tree. It seems like there is valuable information about the states in the old tree, especially given that the best move is one where the MCTS has explored most. Is there any particular reason we can't use this old information in some useful way?
Some implementations do indeed retain the information.
For example, the AlphaGo Zero paper says:
The search tree is reused at subsequent time-steps: the child node corresponding to the played action becomes the new root node; the subtree below this child is retained along with all its statistics, while the remainder of the tree is discarded
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With