I could not understand how to update Q values for tic tac toe game. I read all about that but I could not imagine how to do this. I read that Q value is updated end of the game, but I haven't understand that if there is Q value for each action ?
You have a Q value for each state-action pair. You update one Q value after every action you perform. More precisely, if applying action a1 from state s1 gets you into state s2 and brings you some reward r, then you update Q(s1, a1) as follows:
Q(s1, a1) = Q(s1, a1) + learning_rate * (r + discount_factor * max Q(s2, _) - Q(s1, a1))
In many games, such as tic-tac-toe you don't get rewards until the end of the game, that's why you have to run the algorithm through several episodes. That's how information about utility of final states is propagated to other states.
The problem with the standard Q Learning algorithm is that it just takes too long to propagate the values from the final to the first move because you only know the outcome of the game at the end of it.
Therefore the Q Learning algorithm should be modified. The following paper gives some details on possible modifications:
Abstract:
This paper reports our experiment on applying Q Learning algorithm for learning to play Tic-tac-toe. The original algorithm is modified by updating the Q value only when the game terminates, propagating the update process from the final move backward to the first move, and incorporating a new update rule. We evaluate the agent performance using full-board and partial-board representations. In this evaluation, the agent plays the tic-tac-toe game against human players. The evaluation results show that the performance of modified Q Learning algorithm with partial-board representation is comparable to that of human players.
Learning to Play Tic-Tac-Toe (2009) by Dwi H. Widyantoro & Yus G. Vembrina
(Unfortunately it is behind a paywall. Either you have access to the IEEE archive or you can ask the authors to provide a copy on researchgate: https://www.researchgate.net/publication/251899151_Learning_to_play_Tic-tac-toe)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With