I've been struggling to understand the differences between <code>.clone()</code>, <code>.detach()</code> and <code>copy.deepcopy</code> when using Pytorch. In particular with Pytorch tensors. I tried writing all my question about their differences and uses cases and became overwhelmed quickly and realized that perhaps have the 4 main properties of Pytorch tensors would clarify much better which one to use that going through every small question. The 4 main properties I realized one needs keep track are: <ol> <li>if one has a new pointer/reference to a tensor</li> <li>if one has a new tensor object instance (and thus most likely this new instance has it's own meta-data like <code>require_grads</code>, shape, <code>is_leaf</code>, etc.)</li> <li>if it has allocated a new memory for the tensor data (i.e. if this new tensor is a view of a different tensor)</li> <li>if it's tracking the history of operations or not (or even if it's tracking a completely new history of operations or the same old one in the case of deep copy)</li> </ol> According to what mined out from the Pytorch forums and the documentation this is my current distinctions for each when used on tensors: <h3>Clone</h3> For clone: <pre class="prettyprint"><code>x_cloned = x.clone() </code></pre> I believe this is how it behaves according to the main 4 properties: <ol> <li>the cloned <code>x_cloned </code> has it's own python reference/pointer to the new object</li> <li>it has created it's own new tensor object instance (with it's separate meta-data)</li> <li>it has allocated a new memory for <code>x_new</code> with the same data as <code>x</code> </li> <li>it is keeping track of the original history of operations and in addition included this <code>clone</code> operation as <code>.grad_fn=<CloneBackward></code> </li> </ol> it seems that the main use of this as I understand is to create copies of things so that <code>inplace_</code> operations are safe. In addition coupled with <code>.detach</code> as <code>.detach().clone()</code> (the "better" order to do it btw) it creates a completely new tensor that has been detached with the old history and thus stops gradient flow through that path. <h3>Detach</h3> <pre class="prettyprint"><code>x_detached = x.detach() </code></pre> <ol> <li>creates a new python reference (the only one that does not is doing <code>x_new = x</code> of course). One can use <code>id</code> for this one I believe</li> <li>it has created it's own new tensor object instance (with it's separate meta-data)</li> <li>it has NOT allocated a new memory for <code>x_detached</code> with the same data as x</li> <li>it cuts the history of the gradients and does not allow it to flow through it. I think it's right to think of it as having no history, as a brand new tensor.</li> </ol> I believe the only sensible use I know of is of creating new copies with it's own memory when coupled with <code>.clone()</code> as <code>.detach().clone()</code>. Otherwise, I am not sure what the use it. Since it points to the original data, doing in place ops might be potentially dangerous (since it changes the old data but the change to the old data is NOT known by autograd in the earlier computation graph). <h3>copy.deepcopy</h3> <pre class="prettyprint"><code>x_deepcopy = copy.deepcopy(x) </code></pre> <ol> <li>if one has a new pointer/reference to a tensor</li> <li>it creates a new tensor instance with it's own meta-data (all of the meta-data should point to deep copies, so new objects if it's implemented as one would expect I hope).</li> <li>it has it's own memory allocated for the tensor data</li> <li>If it truly is a deep copy, I would expect a deep copy of the history. So it should do a deep replication of the history. Though this seems really expensive but at least semantically consistent with what deep copy should be.</li> </ol> I don't really see a use case for this. I assume anyone trying to use this really meant 1) <code>.detach().clone()</code> or just 2) <code>.clone()</code> by itself, depending if one wants to stop gradient flows to the earlier graph with 1 or if they want just to replicate the data with a new memory 2). So this is the best way I have to understand the differences as of now rather than ask all the different scenarios that one might use them. So is this right? Does anyone see any major flaw that needs to be correct? My own worry is about the semantics I gave to deep copy and wonder if it's correct wrt the deep copying the history. I think a list of common use cases for each would be wonderful. <hr> <h3>Resources</h3> these are all the resources I've read and participated to arrive at the conclusions in this question: <ul> <li>Migration guide to 0.4.0 https://pytorch.org/blog/pytorch-0_4_0-migration-guide/ </li> <li>Confusion about using clone: https://discuss.pytorch.org/t/confusion-about-using-clone/39673/3 </li> <li>Clone and detach in v0.4.0: https://discuss.pytorch.org/t/clone-and-detach-in-v0-4-0/16861/2 </li> <li>Docs for clone: <ul> <li>https://pytorch.org/docs/stable/tensors.html#torch.Tensor.clone</li> </ul> </li> <li>Docs for detach (search for the word detach in your browser there is no direct link): <ul> <li>https://pytorch.org/docs/stable/tensors.html#torch.Tensor</li> </ul> </li> <li>Difference between detach().clone() and clone().detach(): https://discuss.pytorch.org/t/difference-between-detach-clone-and-clone-detach/34173 </li> <li>Why am I able to change the value of a tensor without the computation graph knowing about it in Pytorch with detach? Why am I able to change the value of a tensor without the computation graph knowing about it in Pytorch with detach? </li> <li>What is the difference between detach, clone and deepcopy in Pytorch tensors in detail? What is the difference between detach, clone and deepcopy in Pytorch tensors in detail? </li> <li>Copy.deepcopy() vs clone() https://discuss.pytorch.org/t/copy-deepcopy-vs-clone/55022/10 </li> </ul>

Note: Since this question was posted the behaviour and doc pages for these functions have been updated. <hr> <h3><code>torch.clone()</code></h3> Copies the tensor while maintaining a link in the autograd graph. To be used if you want to e.g. duplicate a tensor as an operation in a neural network (for example, passing a mid-level representation to two different heads for calculating different losses): <blockquote> Returns a copy of input. <blockquote> NOTE: This function is differentiable, so gradients will flow back from the result of this operation to input. To create a tensor without an autograd relationship to input see <code>detach()</code>. </blockquote> </blockquote> <h3><code>torch.tensor.detach()</code></h3> Returns a view of the original tensor without the autograd history. To be used if you want to manipulate the values of a tensor (not in place) without affecting the computational graph (e.g. reporting values midway through the forward pass). <blockquote> Returns a new Tensor, detached from the current graph. The result will never require gradient. This method also affects forward mode AD gradients and the result will never have forward mode AD gradients. <blockquote> NOTE: Returned Tensor shares the same storage with the original one. In-place modifications on either of them will be seen, and may trigger errors in correctness checks. 1 </blockquote> </blockquote> <h3><code>copy.deepcopy</code></h3> <code>deepcopy</code> is a generic python function from the <code>copy</code> library which makes a copy of an existing object (recursively if the object itself contains objects). This is used (as opposed to more usual assignment) when the underlying object you wish to make a copy of is mutable (or contains mutables) and would be susceptible to mirroring changes made in one: <blockquote> Assignment statements in Python do not copy objects, they create bindings between a target and an object. For collections that are mutable or contain mutable items, a copy is sometimes needed so one can change one copy without changing the other. </blockquote> In a PyTorch setting, as you say, if you want a fresh copy of a tensor object to use in a completely different setting with no relationship or effect on its parent, you should use <code>.detach().clone()</code>. <hr> <ol> <li> <blockquote> IMPORTANT NOTE: Previously, in-place size / stride / storage changes (such as <code>resize_</code> / <code>resize_as_</code> / <code>set_</code> / <code>transpose_</code>) to the returned tensor also update the original tensor. Now, these in-place changes will not update the original tensor anymore, and will instead trigger an error. For sparse tensors: In-place indices / values changes (such as <code>zero_</code> / <code>copy_</code> / <code>add_</code>) to the returned tensor will not update the original tensor anymore, and will instead trigger an error. </blockquote> </li> </ol>

What is the difference between detach, clone and deepcopy in Pytorch tensors in detail?

Tags:

python

machine-learning

pytorch

I've been struggling to understand the differences between .clone(), .detach() and copy.deepcopy when using Pytorch. In particular with Pytorch tensors.

I tried writing all my question about their differences and uses cases and became overwhelmed quickly and realized that perhaps have the 4 main properties of Pytorch tensors would clarify much better which one to use that going through every small question. The 4 main properties I realized one needs keep track are:

if one has a new pointer/reference to a tensor
if one has a new tensor object instance (and thus most likely this new instance has it's own meta-data like require_grads, shape, is_leaf, etc.)
if it has allocated a new memory for the tensor data (i.e. if this new tensor is a view of a different tensor)
if it's tracking the history of operations or not (or even if it's tracking a completely new history of operations or the same old one in the case of deep copy)

According to what mined out from the Pytorch forums and the documentation this is my current distinctions for each when used on tensors:

Clone

For clone:

x_cloned = x.clone()

I believe this is how it behaves according to the main 4 properties:

the cloned x_cloned has it's own python reference/pointer to the new object
it has created it's own new tensor object instance (with it's separate meta-data)
it has allocated a new memory for x_new with the same data as x
it is keeping track of the original history of operations and in addition included this clone operation as .grad_fn=<CloneBackward>

it seems that the main use of this as I understand is to create copies of things so that inplace_ operations are safe. In addition coupled with .detach as .detach().clone() (the "better" order to do it btw) it creates a completely new tensor that has been detached with the old history and thus stops gradient flow through that path.

Detach

x_detached = x.detach()

creates a new python reference (the only one that does not is doing x_new = x of course). One can use id for this one I believe
it has created it's own new tensor object instance (with it's separate meta-data)
it has NOT allocated a new memory for x_detached with the same data as x
it cuts the history of the gradients and does not allow it to flow through it. I think it's right to think of it as having no history, as a brand new tensor.

I believe the only sensible use I know of is of creating new copies with it's own memory when coupled with .clone() as .detach().clone(). Otherwise, I am not sure what the use it. Since it points to the original data, doing in place ops might be potentially dangerous (since it changes the old data but the change to the old data is NOT known by autograd in the earlier computation graph).

copy.deepcopy

x_deepcopy = copy.deepcopy(x)

if one has a new pointer/reference to a tensor
it creates a new tensor instance with it's own meta-data (all of the meta-data should point to deep copies, so new objects if it's implemented as one would expect I hope).
it has it's own memory allocated for the tensor data
If it truly is a deep copy, I would expect a deep copy of the history. So it should do a deep replication of the history. Though this seems really expensive but at least semantically consistent with what deep copy should be.

I don't really see a use case for this. I assume anyone trying to use this really meant 1) .detach().clone() or just 2) .clone() by itself, depending if one wants to stop gradient flows to the earlier graph with 1 or if they want just to replicate the data with a new memory 2).

So this is the best way I have to understand the differences as of now rather than ask all the different scenarios that one might use them.

So is this right? Does anyone see any major flaw that needs to be correct?

My own worry is about the semantics I gave to deep copy and wonder if it's correct wrt the deep copying the history.

I think a list of common use cases for each would be wonderful.

Resources

these are all the resources I've read and participated to arrive at the conclusions in this question:

Migration guide to 0.4.0 https://pytorch.org/blog/pytorch-0_4_0-migration-guide/
Confusion about using clone: https://discuss.pytorch.org/t/confusion-about-using-clone/39673/3
Clone and detach in v0.4.0: https://discuss.pytorch.org/t/clone-and-detach-in-v0-4-0/16861/2
Docs for clone:
- https://pytorch.org/docs/stable/tensors.html#torch.Tensor.clone
Docs for detach (search for the word detach in your browser there is no direct link):
- https://pytorch.org/docs/stable/tensors.html#torch.Tensor
Difference between detach().clone() and clone().detach(): https://discuss.pytorch.org/t/difference-between-detach-clone-and-clone-detach/34173
Why am I able to change the value of a tensor without the computation graph knowing about it in Pytorch with detach? Why am I able to change the value of a tensor without the computation graph knowing about it in Pytorch with detach?
What is the difference between detach, clone and deepcopy in Pytorch tensors in detail? What is the difference between detach, clone and deepcopy in Pytorch tensors in detail?
Copy.deepcopy() vs clone() https://discuss.pytorch.org/t/copy-deepcopy-vs-clone/55022/10

720

asked Jun 17 '20 20:06

Charlie Parker

1 Answers

Note: Since this question was posted the behaviour and doc pages for these functions have been updated.

`torch.clone()`

Copies the tensor while maintaining a link in the autograd graph. To be used if you want to e.g. duplicate a tensor as an operation in a neural network (for example, passing a mid-level representation to two different heads for calculating different losses):

Returns a copy of input.

NOTE: This function is differentiable, so gradients will flow back from the result of this operation to input. To create a tensor without an autograd relationship to input see detach().

`torch.tensor.detach()`

Returns a view of the original tensor without the autograd history. To be used if you want to manipulate the values of a tensor (not in place) without affecting the computational graph (e.g. reporting values midway through the forward pass).

Returns a new Tensor, detached from the current graph.

The result will never require gradient.

This method also affects forward mode AD gradients and the result will never have forward mode AD gradients.

NOTE: Returned Tensor shares the same storage with the original one. In-place modifications on either of them will be seen, and may trigger errors in correctness checks. ¹

`copy.deepcopy`

deepcopy is a generic python function from the copy library which makes a copy of an existing object (recursively if the object itself contains objects).

This is used (as opposed to more usual assignment) when the underlying object you wish to make a copy of is mutable (or contains mutables) and would be susceptible to mirroring changes made in one:

Assignment statements in Python do not copy objects, they create bindings between a target and an object. For collections that are mutable or contain mutable items, a copy is sometimes needed so one can change one copy without changing the other.

In a PyTorch setting, as you say, if you want a fresh copy of a tensor object to use in a completely different setting with no relationship or effect on its parent, you should use .detach().clone().

^{IMPORTANT NOTE: Previously, in-place size / stride / storage changes (such as resize_ / resize_as_ / set_ / transpose_) to the returned tensor also update the original tensor. Now, these in-place changes will not update the original tensor anymore, and will instead trigger an error. For sparse tensors: In-place indices / values changes (such as zero_ / copy_ / add_) to the returned tensor will not update the original tensor anymore, and will instead trigger an error.}

157

answered Oct 05 '22 10:10

iacob

Related questions
                            
                                Getting python -m module to work for a module implemented in C
                            
                                Project Euler Number 338
                            
                                Handling Sessions on Google App Engine with Android/IPhone
                            
                                Handle Firefox Not Responding While Using Selenium WebDriver With Python?
                            
                                Ajax POST returning render_template in Flask?
                            
                                Redux: How do I get Jython to use Python modules stored in Lib within its own jar file when running in Hadoop?
                            
                                How to convert requests.cookiejar to qnetworkcookiejar?
                            
                                Using Numpy in different platforms
                            
                                How to create an OUTPUT typemap for a class type?
                            
                                How do I make rdpy-rdpmitm let client re-input username and password when password not incorrect
                            
                                Jupyter notebook dead kernel
                            
                                Why doesn't Spyder obey my IPython config file?
                            
                                Running .exe on Azure
                            
                                Speed up App Engine local SDK DB query when multiple order properties present?
                            
                                app engine python gcloud not updating instance
                            
                                Properly convert png to npy numpy array (Image to Array)
                            
                                Tensorflow server: I don't want to initialize global variables for every session
                            
                                Strange sdl side-effect on unrelated windows
                            
                                Parallel code not working when function to parallelize is in a different file
                            
                                Problem of discrete logarithm calculation using Python code

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With