I've seen such words as: <blockquote> A policy defines the learning agent's way of behaving at a given time. Roughly speaking, a policy is a mapping from perceived states of the environment to actions to be taken when in those states. </blockquote> But still didn't fully understand. What exactly is a policy in reinforcement learning?

The definition is correct, though not instantly obvious if you see it for the first time. Let me put it this way: a policy is an agent's strategy. For example, imagine a world where a robot moves across the room and the task is to get to the target point (x, y), where it gets a reward. Here: <ul> <li>A room is an environment </li> <li>Robot's current position is a state </li> <li> A policy is what an agent does to accomplish this task: <ul> <li>dumb robots just wander around randomly until they accidentally end up in the right place (policy #1)</li> <li>others may, for some reason, learn to go along the walls most of the route (policy #2)</li> <li>smart robots plan the route in their "head" and go straight to the goal (policy #3)</li> </ul> </li> </ul> Obviously, some policies are better than others, and there are multiple ways to assess them, namely state-value function and action-value function. The goal of RL is to learn the best policy. Now the definition should make more sense (note that in the context time is better understood as a state): A policy defines the learning agent's way of behaving at a given time. <h3>Formally</h3> More formally, we should first define Markov Decision Process (MDP) as a tuple (<code>S</code>, <code>A</code>, <code>P</code>, <code>R</code>, <code>y</code>), where: <ul> <li> <code>S</code> is a finite set of states</li> <li> <code>A</code> is a finite set of actions</li> <li> <code>P</code> is a state transition probability matrix (probability of ending up in a state for each current state and each action)</li> <li> <code>R</code> is a reward function, given a state and an action</li> <li> <code>y</code> is a discount factor, between 0 and 1</li> </ul> Then, a policy <code>π</code> is a probability distribution over actions given states. That is the likelihood of every action when an agent is in a particular state (of course, I'm skipping a lot of details here). This definition corresponds to the second part of your definition. I highly recommend David Silver's RL course available on YouTube. The first two lectures focus particularly on MDPs and policies.

What is a policy in reinforcement learning? [closed]

1 Answers

The definition is correct, though not instantly obvious if you see it for the first time. Let me put it this way: a policy is an agent's strategy.

For example, imagine a world where a robot moves across the room and the task is to get to the target point (x, y), where it gets a reward. Here:

A room is an environment
Robot's current position is a state
A policy is what an agent does to accomplish this task:
- dumb robots just wander around randomly until they accidentally end up in the right place (policy #1)
- others may, for some reason, learn to go along the walls most of the route (policy #2)
- smart robots plan the route in their "head" and go straight to the goal (policy #3)

Obviously, some policies are better than others, and there are multiple ways to assess them, namely state-value function and action-value function. The goal of RL is to learn the best policy. Now the definition should make more sense (note that in the context time is better understood as a state):

A policy defines the learning agent's way of behaving at a given time.

Formally

More formally, we should first define Markov Decision Process (MDP) as a tuple (S, A, P, R, y), where:

S is a finite set of states
A is a finite set of actions
P is a state transition probability matrix (probability of ending up in a state for each current state and each action)
R is a reward function, given a state and an action
y is a discount factor, between 0 and 1

Then, a policy π is a probability distribution over actions given states. That is the likelihood of every action when an agent is in a particular state (of course, I'm skipping a lot of details here). This definition corresponds to the second part of your definition.

I highly recommend David Silver's RL course available on YouTube. The first two lectures focus particularly on MDPs and policies.

174

answered Oct 02 '22 18:10

Maxim

Related questions
                            
                                TensorFlow operator overloading
                            
                                How to understand the term `tensor` in TensorFlow?
                            
                                Neural Networks: What does "linearly separable" mean?
                            
                                xgboost in R: how does xgb.cv pass the optimal parameters into xgb.train
                            
                                How to pick a language for Artificial Intelligence programming? [closed]
                            
                                ResNet: 100% accuracy during training, but 33% prediction accuracy with the same data
                            
                                Correlated features and classification accuracy
                            
                                Machine Learning & Big Data [closed]
                            
                                Machine Learning Algorithm for Predicting Order of Events?
                            
                                Hyperparameter optimization for Pytorch model [closed]
                            
                                Difference between standardscaler and Normalizer in sklearn.preprocessing
                            
                                How to understand SpatialDropout1D and when to use it?
                            
                                Does ImageDataGenerator add more images to my dataset?
                            
                                Can anyone give a real life example of supervised learning and unsupervised learning? [closed]
                            
                                Kmeans without knowing the number of clusters? [duplicate]
                            
                                What is the difference between UpSampling2D and Conv2DTranspose functions in keras?
                            
                                import input_data MNIST tensorflow not working
                            
                                What is the difference between back-propagation and feed-forward Neural Network?
                            
                                How to split data on balanced training set and test set on sklearn
                            
                                How to use k-fold cross validation in a neural network

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is a policy in reinforcement learning? [closed]

Tags:

terminology

machine-learning

reinforcement-learning

markov-decision-process

Alexander Cyberman

People also ask

1 Answers

Formally

Maxim

Recent Activity

Donate For Us