I am a little confused about how the MCTS "Tree Policy" is implemented. Every paper or article I read talks about going down the tree from the current game state(in MCTS teminology: the root for the player about to make a move). My question is how am I selecting the best child even when I am at the MIN player level ( assuming I am the MAX player). Even if I select some particular action that MIN might take, and my search tree gets deeper through that node, the MIN player during its turn might just as well choose some different node.( If the min player is a amateur human it might just as well choose some node which is not necessarily the best). This kind of makes MAX's entire work in propagating through that node futile since the MIN has chosen a different node. For the steps I am referring to : https://jeffbradberry.com/posts/2015/09/intro-to-monte-carlo-tree-search/ where the tree policy : https://jeffbradberry.com/images/mcts_selection.png kind of makes me believe that they are executing it from a single player perspective.
To implement MCTS for two player game, you can simply flip the sign in every step of back-propagation, a one-line change in the code.
This means we are trying to maximize reward in every layer, but when we propagate the reward up the tree the positive reward for your opponent become negative for you when you get to your layer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With