I've been struggling to understand the differences between .clone()
, .detach()
and copy.deepcopy
when using Pytorch. In particular with Pytorch tensors.
I tried writing all my question about their differences and uses cases and became overwhelmed quickly and realized that perhaps have the 4 main properties of Pytorch tensors would clarify much better which one to use that going through every small question. The 4 main properties I realized one needs keep track are:
require_grads
, shape, is_leaf
, etc.)According to what mined out from the Pytorch forums and the documentation this is my current distinctions for each when used on tensors:
For clone:
x_cloned = x.clone()
I believe this is how it behaves according to the main 4 properties:
x_cloned
has it's own python reference/pointer to the new objectx_new
with the same data as x
clone
operation as .grad_fn=<CloneBackward>
it seems that the main use of this as I understand is to create copies of things so that inplace_
operations are safe. In addition coupled with .detach
as .detach().clone()
(the "better" order to do it btw) it creates a completely new tensor that has been detached with the old history and thus stops gradient flow through that path.
x_detached = x.detach()
x_new = x
of course). One can use id
for this one I believex_detached
with the same data as xI believe the only sensible use I know of is of creating new copies with it's own memory when coupled with .clone()
as .detach().clone()
. Otherwise, I am not sure what the use it. Since it points to the original data, doing in place ops might be potentially dangerous (since it changes the old data but the change to the old data is NOT known by autograd in the earlier computation graph).
x_deepcopy = copy.deepcopy(x)
I don't really see a use case for this. I assume anyone trying to use this really meant 1) .detach().clone()
or just 2) .clone()
by itself, depending if one wants to stop gradient flows to the earlier graph with 1 or if they want just to replicate the data with a new memory 2).
So this is the best way I have to understand the differences as of now rather than ask all the different scenarios that one might use them.
So is this right? Does anyone see any major flaw that needs to be correct?
My own worry is about the semantics I gave to deep copy and wonder if it's correct wrt the deep copying the history.
I think a list of common use cases for each would be wonderful.
these are all the resources I've read and participated to arrive at the conclusions in this question:
Returns a copy of input . This function is differentiable, so gradients will flow back from the result of this operation to input .
detach () Returns a new Tensor, detached from the current graph. The result will never require gradient. This method also affects forward mode AD gradients and the result will never have forward mode AD gradients. Returned Tensor shares the same storage with the original one.
inplace operation It detects that 'a' has changed inplace and this will trip gradient calculation. It is because . detach() doesn't implicitly create a copy of the tensor, so when the tensor is modified later, it's updating the tensor on the upstream side of . detach() too.
detach() is used to detach a tensor from the current computational graph. It returns a new tensor that doesn't require a gradient.
Note: Since this question was posted the behaviour and doc pages for these functions have been updated.
torch.clone()
Copies the tensor while maintaining a link in the autograd graph. To be used if you want to e.g. duplicate a tensor as an operation in a neural network (for example, passing a mid-level representation to two different heads for calculating different losses):
Returns a copy of input.
NOTE: This function is differentiable, so gradients will flow back from the result of this operation to input. To create a tensor without an autograd relationship to input see
detach()
.
torch.tensor.detach()
Returns a view of the original tensor without the autograd history. To be used if you want to manipulate the values of a tensor (not in place) without affecting the computational graph (e.g. reporting values midway through the forward pass).
Returns a new Tensor, detached from the current graph.
The result will never require gradient.
This method also affects forward mode AD gradients and the result will never have forward mode AD gradients.
NOTE: Returned Tensor shares the same storage with the original one. In-place modifications on either of them will be seen, and may trigger errors in correctness checks. 1
copy.deepcopy
deepcopy
is a generic python function from the copy
library which makes a copy of an existing object (recursively if the object itself contains objects).
This is used (as opposed to more usual assignment) when the underlying object you wish to make a copy of is mutable (or contains mutables) and would be susceptible to mirroring changes made in one:
Assignment statements in Python do not copy objects, they create bindings between a target and an object. For collections that are mutable or contain mutable items, a copy is sometimes needed so one can change one copy without changing the other.
In a PyTorch setting, as you say, if you want a fresh copy of a tensor object to use in a completely different setting with no relationship or effect on its parent, you should use .detach().clone()
.
IMPORTANT NOTE: Previously, in-place size / stride / storage changes (such as
resize_
/resize_as_
/set_
/transpose_
) to the returned tensor also update the original tensor. Now, these in-place changes will not update the original tensor anymore, and will instead trigger an error. For sparse tensors: In-place indices / values changes (such aszero_
/copy_
/add_
) to the returned tensor will not update the original tensor anymore, and will instead trigger an error.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With