Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between detach, clone and deepcopy in Pytorch tensors in detail?

I've been struggling to understand the differences between .clone(), .detach() and copy.deepcopy when using Pytorch. In particular with Pytorch tensors.

I tried writing all my question about their differences and uses cases and became overwhelmed quickly and realized that perhaps have the 4 main properties of Pytorch tensors would clarify much better which one to use that going through every small question. The 4 main properties I realized one needs keep track are:

  1. if one has a new pointer/reference to a tensor
  2. if one has a new tensor object instance (and thus most likely this new instance has it's own meta-data like require_grads, shape, is_leaf, etc.)
  3. if it has allocated a new memory for the tensor data (i.e. if this new tensor is a view of a different tensor)
  4. if it's tracking the history of operations or not (or even if it's tracking a completely new history of operations or the same old one in the case of deep copy)

According to what mined out from the Pytorch forums and the documentation this is my current distinctions for each when used on tensors:

Clone

For clone:

x_cloned = x.clone()

I believe this is how it behaves according to the main 4 properties:

  1. the cloned x_cloned has it's own python reference/pointer to the new object
  2. it has created it's own new tensor object instance (with it's separate meta-data)
  3. it has allocated a new memory for x_new with the same data as x
  4. it is keeping track of the original history of operations and in addition included this clone operation as .grad_fn=<CloneBackward>

it seems that the main use of this as I understand is to create copies of things so that inplace_ operations are safe. In addition coupled with .detach as .detach().clone() (the "better" order to do it btw) it creates a completely new tensor that has been detached with the old history and thus stops gradient flow through that path.

Detach

x_detached = x.detach()
  1. creates a new python reference (the only one that does not is doing x_new = x of course). One can use id for this one I believe
  2. it has created it's own new tensor object instance (with it's separate meta-data)
  3. it has NOT allocated a new memory for x_detached with the same data as x
  4. it cuts the history of the gradients and does not allow it to flow through it. I think it's right to think of it as having no history, as a brand new tensor.

I believe the only sensible use I know of is of creating new copies with it's own memory when coupled with .clone() as .detach().clone(). Otherwise, I am not sure what the use it. Since it points to the original data, doing in place ops might be potentially dangerous (since it changes the old data but the change to the old data is NOT known by autograd in the earlier computation graph).

copy.deepcopy

x_deepcopy = copy.deepcopy(x)
  1. if one has a new pointer/reference to a tensor
  2. it creates a new tensor instance with it's own meta-data (all of the meta-data should point to deep copies, so new objects if it's implemented as one would expect I hope).
  3. it has it's own memory allocated for the tensor data
  4. If it truly is a deep copy, I would expect a deep copy of the history. So it should do a deep replication of the history. Though this seems really expensive but at least semantically consistent with what deep copy should be.

I don't really see a use case for this. I assume anyone trying to use this really meant 1) .detach().clone() or just 2) .clone() by itself, depending if one wants to stop gradient flows to the earlier graph with 1 or if they want just to replicate the data with a new memory 2).

So this is the best way I have to understand the differences as of now rather than ask all the different scenarios that one might use them.

So is this right? Does anyone see any major flaw that needs to be correct?

My own worry is about the semantics I gave to deep copy and wonder if it's correct wrt the deep copying the history.

I think a list of common use cases for each would be wonderful.


Resources

these are all the resources I've read and participated to arrive at the conclusions in this question:

  • Migration guide to 0.4.0 https://pytorch.org/blog/pytorch-0_4_0-migration-guide/
  • Confusion about using clone: https://discuss.pytorch.org/t/confusion-about-using-clone/39673/3
  • Clone and detach in v0.4.0: https://discuss.pytorch.org/t/clone-and-detach-in-v0-4-0/16861/2
  • Docs for clone:
    • https://pytorch.org/docs/stable/tensors.html#torch.Tensor.clone
  • Docs for detach (search for the word detach in your browser there is no direct link):
    • https://pytorch.org/docs/stable/tensors.html#torch.Tensor
  • Difference between detach().clone() and clone().detach(): https://discuss.pytorch.org/t/difference-between-detach-clone-and-clone-detach/34173
  • Why am I able to change the value of a tensor without the computation graph knowing about it in Pytorch with detach? Why am I able to change the value of a tensor without the computation graph knowing about it in Pytorch with detach?
  • What is the difference between detach, clone and deepcopy in Pytorch tensors in detail? What is the difference between detach, clone and deepcopy in Pytorch tensors in detail?
  • Copy.deepcopy() vs clone() https://discuss.pytorch.org/t/copy-deepcopy-vs-clone/55022/10
like image 720
Charlie Parker Avatar asked Jun 17 '20 20:06

Charlie Parker


People also ask

What does clone () do in PyTorch?

Returns a copy of input . This function is differentiable, so gradients will flow back from the result of this operation to input .

What is detach in PyTorch?

detach () Returns a new Tensor, detached from the current graph. The result will never require gradient. This method also affects forward mode AD gradients and the result will never have forward mode AD gradients. Returned Tensor shares the same storage with the original one.

Does detach create a copy PyTorch?

inplace operation It detects that 'a' has changed inplace and this will trip gradient calculation. It is because . detach() doesn't implicitly create a copy of the tensor, so when the tensor is modified later, it's updating the tensor on the upstream side of . detach() too.

Does detach create a new tensor?

detach() is used to detach a tensor from the current computational graph. It returns a new tensor that doesn't require a gradient.


1 Answers

Note: Since this question was posted the behaviour and doc pages for these functions have been updated.


torch.clone()

Copies the tensor while maintaining a link in the autograd graph. To be used if you want to e.g. duplicate a tensor as an operation in a neural network (for example, passing a mid-level representation to two different heads for calculating different losses):

Returns a copy of input.

NOTE: This function is differentiable, so gradients will flow back from the result of this operation to input. To create a tensor without an autograd relationship to input see detach().

torch.tensor.detach()

Returns a view of the original tensor without the autograd history. To be used if you want to manipulate the values of a tensor (not in place) without affecting the computational graph (e.g. reporting values midway through the forward pass).

Returns a new Tensor, detached from the current graph.

The result will never require gradient.

This method also affects forward mode AD gradients and the result will never have forward mode AD gradients.

NOTE: Returned Tensor shares the same storage with the original one. In-place modifications on either of them will be seen, and may trigger errors in correctness checks. 1

copy.deepcopy

deepcopy is a generic python function from the copy library which makes a copy of an existing object (recursively if the object itself contains objects).

This is used (as opposed to more usual assignment) when the underlying object you wish to make a copy of is mutable (or contains mutables) and would be susceptible to mirroring changes made in one:

Assignment statements in Python do not copy objects, they create bindings between a target and an object. For collections that are mutable or contain mutable items, a copy is sometimes needed so one can change one copy without changing the other.

In a PyTorch setting, as you say, if you want a fresh copy of a tensor object to use in a completely different setting with no relationship or effect on its parent, you should use .detach().clone().


  1. IMPORTANT NOTE: Previously, in-place size / stride / storage changes (such as resize_ / resize_as_ / set_ / transpose_) to the returned tensor also update the original tensor. Now, these in-place changes will not update the original tensor anymore, and will instead trigger an error. For sparse tensors: In-place indices / values changes (such as zero_ / copy_ / add_) to the returned tensor will not update the original tensor anymore, and will instead trigger an error.

like image 157
iacob Avatar answered Oct 05 '22 10:10

iacob