Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SARSA algorithm

I am having trouble understanding the SARSA algorithm: http://en.wikipedia.org/wiki/SARSA

In particular, when updating the Q value what is gamma? and what values are used for s(t+1) and a(t+1)?

Can someone explain this algorithm to me?

Thanks.

like image 232
Neutralise Avatar asked Oct 10 '22 23:10

Neutralise


1 Answers

Gamma determines how much memory your algorithm has. If you set it to 0.0, then your algorithm will not update the value function Q at all. If you set it to 1.0, then the new experience will be given as much weight as all the previous experiences combined. The best values lie inbetween and have to be determined experimentally.

Here is how it works:

  • In your first step, you just get a state. Simply store it away as st. Also, look up your value function for the best action to make in this state and store it as at.
  • In each subsequent step, you get rt+1 and st+1. Again, use your value function to find the best action — at+1. The value of the transition from your previous action to the new one is equal to rt+1+Q(st+1,at+1)-Q(st,at). Use this to update your long-term estimate of the previous action's value Q(st,att). Finally, store st+1 and at+1 as st and at for the next step.

In effect, the value function is just a running average of these update values for each action and every state.

like image 68
Don Reba Avatar answered Nov 03 '22 21:11

Don Reba