I was wondering how to deal with in-place operations in PyTorch. As I remember using in-place operation with autograd has always been problematic.
And actually I’m surprised that this code below works, even though I haven’t tested it I believe this code would have raised an error in version 0.3.1
.
Basically I want do is set a certain position of a tensor vector to a certain value in a like:
my_tensor[i] = 42
Working example code:
# test parameter a
a = torch.rand((2), requires_grad=True)
print('a ', a)
b = torch.rand(2)
# calculation
c = a + b
# performing in-place operation
c[0] = 0
print('c ', c)
s = torch.sum(c)
print('s ', s)
# calling backward()
s.backward()
# optimizer step
optim = torch.optim.Adam(params=[a], lr=0.5)
optim.step()
# changed parameter a
print('changed a', a)
Output:
a tensor([0.2441, 0.2589], requires_grad=True)
c tensor([0.0000, 1.1511], grad_fn=<CopySlices>)
s tensor(1.1511, grad_fn=<SumBackward0>)
changed a tensor([ 0.2441, -0.2411], requires_grad=True)
So obviously in version 0.4.1
. this works just fine without warnings or errors.
Referring to this article in the documentation: autograd-mechanics
Supporting in-place operations in autograd is a hard matter, and we discourage their use in most cases. Autograd’s aggressive buffer freeing and reuse makes it very efficient and there are very few occasions when in-place operations actually lower memory usage by any significant amount. Unless you’re operating under heavy memory pressure, you might never need to use them.
But even though it works, the use of in-place operations is discouraged in most cases.
So my questions are:
How much does the usage of in-place operations affect performance?
How do I get around using in-place operations in such cases where I want to set one element of a tensor to a certain value?
Thanks in advance!
How autograd encodes the history. Autograd is reverse automatic differentiation system. Conceptually, autograd records a graph recording all of the operations that created the data as you execute operations, giving you a directed acyclic graph whose leaves are the input tensors and roots are the output tensors.
PyTorchServer Side ProgrammingProgramming. To create a tensor with gradients, we use an extra parameter "requires_grad = True" while creating a tensor. requires_grad is a flag that controls whether a tensor requires a gradient or not. Only floating point and complex dtype tensors can require gradients.
Backward Propagation: In backprop, the NN adjusts its parameters proportionate to the error in its guess. It does this by traversing backwards from the output, collecting the derivatives of the error with respect to the parameters of the functions (gradients), and optimizing the parameters using gradient descent.
grad_fn attribute that references a function that has created a function (except for Tensors created by the user - these have None as .
I am not sure about how much in-place operation affect performance but I can address the second query. You can use a mask instead of in-place ops.
a = torch.rand((2), requires_grad=True)
print('a ', a)
b = torch.rand(2)
# calculation
c = a + b
# performing in-place operation
mask = np.zeros(2)
mask[1] =1
mask = torch.tensor(mask)
c = c*mask
...
This may be not a direct answer to your question, but just for information.
In-place operations work for non-leaf tensors in a computational graph.
Leaf tensors are tensors which are the 'ends' of a computational graph. Officially (from is_leaf
attribute here),
For Tensors that have requires_grad which is True, they will be leaf Tensors if they were created by the user. This means that they are not the result of an operation and so grad_fn is None.
Example which works without error:
a = torch.tensor([3.,2.,7.], requires_grad=True)
print(a) # tensor([3., 2., 7.], requires_grad=True)
b = a**2
print(b) # tensor([ 9., 4., 49.], grad_fn=<PowBackward0>)
b[1] = 0
print(b) # tensor([ 9., 0., 49.], grad_fn=<CopySlices>)
c = torch.sum(2*b)
print(c) # tensor(116., grad_fn=<SumBackward0>)
c.backward()
print(a.grad) # tensor([12., 0., 28.])
On the other hand, in-place operations do not work for leaf tensors.
Example which causes error:
a = torch.tensor([3.,2.,7.], requires_grad=True)
print(a) # tensor([3., 2., 7.], requires_grad=True)
a[1] = 0
print(a) # tensor([3., 0., 7.], grad_fn=<CopySlices>)
b = a**2
print(b) # tensor([ 9., 0., 49.], grad_fn=<PowBackward0>)
c = torch.sum(2*b)
print(c) # tensor(116., grad_fn=<SumBackward0>)
c.backward() # Error occurs at this line.
# RuntimeError: leaf variable has been moved into the graph interior
I suppose that b[1]=0
operation, in the first example above, is not really an in-place operation. I suppose that it creates a new tensor with "CopySlices" operation. The 'old b' before the in-place operation might be kept internally (just its name being overwritten by the 'new b'). I found a nice figure here.
old b ---(CopySlices)----> new b
On the other hand, the tensor a
is a leaf tensor. After the CopySlices operation a[1]=0
, it becomes an intermediate tensor. To avoid such a complicated mixture between leaf tensors and intermediate tensors when back propagating, CopySlices operation on leaf tensors is prohibited from coexisting with backward.
This is merely my personal opinion, so please refer to official documents.
Note:
Although in-place operations work for intermediate tensors, it will be safe to use clone and detach as much as possible when you do some in-place operations, to explicitly create a new tensor which is independent of the computational graph.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With