Are Q-learning and SARSA with greedy selection equivalent?

1 Answers

If an optimal policy has already formed, SARSA with pure greedy and Q-learning are same.

However, in training, we only have a policy or sub-optimal policy, SARSA with pure greedy will only converge to the "best" sub-optimal policy available without trying to explore the optimal one, while Q-learning will do, because of enter image description here , which means it tries all actions available and choose the max one.

179

answered Sep 18 '22 15:09

Alan Yang

Related questions
                            
                                Free Energy Reinforcement Learning Implementation
                            
                                TensorFlow: Graph Optimization (GPU vs CPU Performance)
                            
                                How do neural networks use genetic algorithms and backpropagation to play games?
                            
                                Pytorch: How to create an update rule that doesn't come from derivatives?
                            
                                Questions about Q-Learning using Neural Networks
                            
                                Are there examples of using reinforcement learning for text classification?
                            
                                Function Approximation: How is tile coding different from highly discretized state space?
                            
                                Stuck in understanding the difference between update usels of TD(0) and TD(λ)
                            
                                Q Learning Algorithm for Tic Tac Toe
                            
                                Reinforcement learning algorithms for continuous states, discrete actions
                            
                                Observations meaning - OpenAI Gym
                            
                                Alpha and Gamma parameters in QLearning
                            
                                tensorflow: how come gather_nd is differentiable?
                            
                                Understanding the total_timesteps parameter in stable-baselines' models
                            
                                net.zero_grad() vs optim.zero_grad() pytorch
                            
                                PyTorch Model Training: RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Are Q-learning and SARSA with greedy selection equivalent?

Tags:

reinforcement-learning

q-learning

sarsa

Mouscellaneous

People also ask

1 Answers

Alan Yang

Recent Activity

Donate For Us