I'm currently trying to solve Pendulum-v0 from the openAi gym environment which has a continuous action space. As a result, I need to use a Normal Distribution to sample my actions. What I don't understand is the dimension of the log_prob when using it : <pre class="prettyprint"><code>import torch from torch.distributions import Normal means = torch.tensor([[0.0538], [0.0651]]) stds = torch.tensor([[0.7865], [0.7792]]) dist = Normal(means, stds) a = torch.tensor([1.2,3.4]) d = dist.log_prob(a) print(d.size()) </code></pre> I was expecting a tensor of size 2 (one log_prob for each actions) but it output a tensor of size(2,2). However, when using a Categorical distribution for discrete environment the log_prob has the expected size: <pre class="prettyprint"><code>logits = torch.tensor([[-0.0657, -0.0949], [-0.0586, -0.1007]]) dist = Categorical(logits = logits) a = torch.tensor([1, 1]) print(dist.log_prob(a).size()) </code></pre> give me a tensor a size(2). Why is the log_prob for Normal distribution of a different size ?

If one takes a look in the source code of torch.distributions.Normal and finds the definition of the log_prob(value) function, one can see that the main part of the calculation is: <pre class="prettyprint"><code>return -((value - self.loc) ** 2) / (2 * var) - some other part </code></pre> where value is a variable containing values for which you want to calculate the log probability (in your case, a), self.loc is the mean of the distribution (in you case, means) and var is the variance, that is, the square of the standard deviation (in your case, stds**2). One can see that this is indeed the logarithm of the probability density function of the normal distribution, minus some constants and logarithm of the standard deviation that I don't write above. In the first example, you define means and stds to be column vectors, while the values to be a row vector <pre class="prettyprint"><code>means = torch.tensor([[0.0538], [0.0651]]) stds = torch.tensor([[0.7865], [0.7792]]) a = torch.tensor([1.2,3.4]) </code></pre> But subtracting a row vector from a column vector, that the code does in value - self.loc in Python gives a matrix (try!), thus the result you obtain is a value of log_prob for each of your two defined distribution and for each of the variables in a. If you want to obtain a log_prob without the cross terms, then define the variables consistently, i.e., either <pre class="prettyprint"><code>means = torch.tensor([[0.0538], [0.0651]]) stds = torch.tensor([[0.7865], [0.7792]]) a = torch.tensor([[1.2],[3.4]]) </code></pre> or <pre class="prettyprint"><code>means = torch.tensor([0.0538, 0.0651]) stds = torch.tensor([0.7865, 0.7792]) a = torch.tensor([1.2,3.4]) </code></pre> This is how you do in your second example, which is why you obtain the result you expected.

Understanding log_prob for Normal distribution in pytorch

Tags:

pytorch

reinforcement-learning

probability-distribution

I'm currently trying to solve Pendulum-v0 from the openAi gym environment which has a continuous action space. As a result, I need to use a Normal Distribution to sample my actions. What I don't understand is the dimension of the log_prob when using it :

import torch
from torch.distributions import Normal 

means = torch.tensor([[0.0538],
        [0.0651]])
stds = torch.tensor([[0.7865],
        [0.7792]])

dist = Normal(means, stds)
a = torch.tensor([1.2,3.4])
d = dist.log_prob(a)
print(d.size())

I was expecting a tensor of size 2 (one log_prob for each actions) but it output a tensor of size(2,2).

However, when using a Categorical distribution for discrete environment the log_prob has the expected size:

logits = torch.tensor([[-0.0657, -0.0949],
        [-0.0586, -0.1007]])

dist = Categorical(logits = logits)
a = torch.tensor([1, 1])
print(dist.log_prob(a).size())

give me a tensor a size(2).

Why is the log_prob for Normal distribution of a different size ?

585

asked Mar 19 '20 20:03

Samuel Beaussant

1 Answers

If one takes a look in the source code of torch.distributions.Normal and finds the definition of the log_prob(value) function, one can see that the main part of the calculation is:

return -((value - self.loc) ** 2) / (2 * var) - some other part

where value is a variable containing values for which you want to calculate the log probability (in your case, a), self.loc is the mean of the distribution (in you case, means) and var is the variance, that is, the square of the standard deviation (in your case, stds**2). One can see that this is indeed the logarithm of the probability density function of the normal distribution, minus some constants and logarithm of the standard deviation that I don't write above.

In the first example, you define means and stds to be column vectors, while the values to be a row vector

means = torch.tensor([[0.0538],
    [0.0651]])
stds = torch.tensor([[0.7865],
    [0.7792]])
a = torch.tensor([1.2,3.4])

But subtracting a row vector from a column vector, that the code does in value - self.loc in Python gives a matrix (try!), thus the result you obtain is a value of log_prob for each of your two defined distribution and for each of the variables in a.

If you want to obtain a log_prob without the cross terms, then define the variables consistently, i.e., either

means = torch.tensor([[0.0538],
    [0.0651]])
stds = torch.tensor([[0.7865],
    [0.7792]])
a = torch.tensor([[1.2],[3.4]])

means = torch.tensor([0.0538,
    0.0651])
stds = torch.tensor([0.7865,
    0.7792])
a = torch.tensor([1.2,3.4])

This is how you do in your second example, which is why you obtain the result you expected.

193

answered Sep 28 '22 01:09

AndrisP

Related questions
                            
                                Learning rate of a Q learning agent
                            
                                How to understand Watkins's Q(λ) learning algorithm in Sutton&Barto's RL book?
                            
                                Negative rewards in QLearning
                            
                                Are off-policy learning methods better than on-policy methods?
                            
                                How to use neural networks to solve "soft" solutions?
                            
                                Why is there no n-step Q-learning algorithm in Sutton's RL book?
                            
                                Normalizing Rewards to Generate Returns in reinforcement learning
                            
                                Can tf.agent policy return probability vector for all actions?
                            
                                Markov Model descision process in Java
                            
                                sknn - input dimension mismatch on second fit
                            
                                How to deal with different state space size in reinforcement learning?
                            
                                Using simple averaging for reinforcment learning
                            
                                Define action values in keras-rl
                            
                                Pytorch RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
                            
                                Are neural networks really abandonware?
                            
                                Reinforcement Learning
                            
                                RL Activation Functions with Negative Rewards
                            
                                When to use a certain Reinforcement Learning algorithm?
                            
                                NameError: name 'base' is not defined OpenAI Gym

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With