Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pytorch RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

This code is built up as follows: My robot takes a picture, some tf computer vision model calculates where in the picture the target object starts. This information (x1 and x2 coordinate) is passed to a pytorch model. It should learn to predict the correct motor activations, in order to get closer to the target. After the movement is executed, the robot takes a picture again and the tf cv model should calculate whether the motor activation brought the robot closer to the desired state (x1 at 10, x2 coordinate at at31)

However every time i run the code pytorch is not able to calculate the gradients.

I'm wondering if this is some data-type problem or if it is a more general one: Is it impossible to calculate the gradients if the loss is not calculated directly from the pytorch network's output?

Any help and suggestions will be greatly appreciated.

#define policy model (model to learn a policy for my robot)
import torch
import torch.nn as nn
import torch.nn.functional as F 
class policy_gradient_model(nn.Module):
    def __init__(self):
        super(policy_gradient_model, self).__init__()
        self.fc0 = nn.Linear(2, 2)
        self.fc1 = nn.Linear(2, 32)
        self.fc2 = nn.Linear(32, 64)
        self.fc3 = nn.Linear(64,32)
        self.fc4 = nn.Linear(32,32)
        self.fc5 = nn.Linear(32, 2)
    def forward(self,x):
        x = self.fc0(x)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.relu(self.fc4(x))
        x = F.relu(self.fc5(x))
        return x

policy_model = policy_gradient_model().double()
print(policy_model)
optimizer = torch.optim.AdamW(policy_model.parameters(), lr=0.005, betas=(0.9,0.999), eps=1e-08, weight_decay=0.01, amsgrad=False)

#make robot move as predicted by pytorch network (not all code included)
def move(motor_controls):
#define curvature
 #   motor_controls[0] = sigmoid(motor_controls[0])
    activation_left = 1+(motor_controls[0])*99
    activation_right = 1+(1- motor_controls[0])*99

    print("activation left:", activation_left, ". activation right:",activation_right, ". time:", motor_controls[1]*100)

#start movement

#main
import cv2
import numpy as np
import time
from torch.autograd import Variable
print("start training")
losses=[]
losses_end_of_epoch=[]
number_of_steps_each_epoch=[]
loss_function = nn.MSELoss(reduction='mean')

#each epoch
for epoch in range(2):
    count=0
    target_reached=False
    while target_reached==False:
        print("epoch: ", epoch, ". step:", count)
###process and take picture
        indices = process_picture()
###binary_network(sliced)=indices as input for policy model
        optimizer.zero_grad()
###output: 1 for curvature, 1 for duration of movement
        motor_controls = policy_model(Variable(torch.from_numpy(indices))).detach().numpy()
        print("NO TANH output for motor: 1)activation left, 2)time ", motor_controls)
        motor_controls[0] = np.tanh(motor_controls[0])
        motor_controls[1] = np.tanh(motor_controls[1])
        print("TANH output for motor: 1)activation left, 2)time ", motor_controls)
###execute suggested action
        move(motor_controls)
###take and process picture2 (after movement)
        indices = (process_picture())
###loss=(binary_network(picture2) - desired
        print("calculate loss")
        print("idx", indices, type(torch.tensor(indices)))
     #   loss = 0
      #  loss = (indices[0]-10)**2+(indices[1]-31)**2
       # loss = loss/2
        print("shape of indices", indices.shape)
        array=np.zeros((1,2))
        array[0]=indices
        print(array.shape, type(array))
        array2 = torch.ones([1,2])
        loss = loss_function(torch.tensor(array).double(), torch.tensor([[10.0,31.0]]).double()).float()
        print("loss: ", loss, type(loss), loss.shape)
       # array2[0] = loss_function(torch.tensor(array).double(), 
        torch.tensor([[10.0,31.0]]).double()).float()
        losses.append(loss)
#start line causing the error-message (still part of main)
###calculate gradients
        loss.backward()
#end line causing the error-message (still part of main)

###apply gradients        
        optimizer.step()

#Output (so far as intented) (not all included)

#calculate loss
idx [14. 15.] <class 'torch.Tensor'>
shape of indices (2,)
(1, 2) <class 'numpy.ndarray'>
loss:  tensor(136.) <class 'torch.Tensor'> torch.Size([])

#Error Message:
Traceback (most recent call last):
  File "/home/pi/Desktop/GradientPolicyLearning/PolicyModel.py", line 259, in <module>
    array2.backward()
  File "/home/pi/.local/lib/python3.7/site-packages/torch/tensor.py", line 134, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/pi/.local/lib/python3.7/site-packages/torch/autograd/__init__.py", line 99, in 
 backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
like image 377
YasarL Avatar asked May 14 '20 23:05

YasarL


People also ask

How do I know if my tensor requires grad PyTorch?

You can check it by accessing torch. Tensor 's requires_grad attribute, which returns True if gradient should be calculated on that Tensor. Note that it has a contagious behavior, that is, if A. requires_grad=True for some Tensor A , all Tensors computed from A have the requires_grad attribute True.

What is Requires_grad in PyTorch?

PyTorchServer Side ProgrammingProgramming. To create a tensor with gradients, we use an extra parameter "requires_grad = True" while creating a tensor. requires_grad is a flag that controls whether a tensor requires a gradient or not. Only floating point and complex dtype tensors can require gradients.

What does Variable do in PyTorch?

A PyTorch Variable is a wrapper around a PyTorch Tensor, and represents a node in a computational graph. If x is a Variable then x. data is a Tensor giving its value, and x. grad is another Variable holding the gradient of x with respect to some scalar value.

What is Loss backward ()?

So, when we call loss. backward() , the whole graph is differentiated w.r.t. the loss, and all Variables in the graph will have their . grad Variable accumulated with the gradient.

Which tensor element does not require a grad_FN?

element 0 of tensors does not require grad and does not have a grad_fn 2 In torch.distributed, how to average gradients on different GPUs correctly? 0 RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

What are some common errors when training RNN with PyTorch?

0 Error training RNN with pytorch : RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn 0 Pytorch Simple Linear Sigmoid Network not learning 1 "RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn " error BertFoeSequenceClassification 1

How to detach output from PyTorch model?

It seems your output gets detached somehow, e.g. by calling detach () directly on a tensor or by leaving PyTorch and using some other library like numpy. If it’s possible the whole model code would be interesting to see, too. As a small side note, you shouldn’t call the forward method of your model, but the model directly instead: model (inputs).

Is it possible to calculate the gradients of PyTorch network loss?

It is indeed impossible to calculate the gradients if the loss is not calculated directly from the PyTorch network's output because then you would not be able to apply the chain rule which is used to optimise the gradients. Share Improve this answer Follow answered May 15 '20 at 8:02


2 Answers

If you call .detach() on the prediction, that will delete the gradients. Since you are first getting indices from the model and then trying to backprop the error, I would suggest

prediction = policy_model(torch.from_numpy(indices))
motor_controls = prediction.clone().detach().numpy()

This would keep the predictions as it is with the calculated gradients that can be backproped.
Now you can do

loss = loss_function(prediction, torch.tensor([[10.0,31.0]]).double()).float()

Note, you might wanna call double of the prediction if it throws an error.

like image 180
dumbPy Avatar answered Sep 22 '22 03:09

dumbPy


It is indeed impossible to calculate the gradients if the loss is not calculated directly from the PyTorch network's output because then you would not be able to apply the chain rule which is used to optimise the gradients.

like image 45
nsidn98 Avatar answered Sep 20 '22 03:09

nsidn98