I am working on an attention model, and before running the final model, I was going through the tensor shapes which flow through the code. I have an operation where I need to reshape the tensor. The tensor is of the shape torch.Size([[30, 8, 9, 64]])
where 30
is the batch_size
, 8
is the number of attention head (this is not relevant to my question) 9
is the number of words in the sentence and 64
is some intermediate embedding representation of the word. I have to reshape the tensor to a size of torch.size([30, 9, 512])
before processing it further. So I was looking into some reference online and they have done the following x.transpose(1, 2).contiguous().view(30, -1, 512)
whereas I was thinking that this should work x.transpose(1, 2).reshape(30, -1, 512)
.
In the first case the grad_fn
is <ViewBackward>
, whereas in my case it is <UnsafeViewBackward>
. Aren't these two the same operations? Will this result in a training error?
grad_fn attribute that references a function that has created a function (except for Tensors created by the user - these have None as .
Autograd is a PyTorch package for the differentiation for all operations on Tensors. It performs the backpropagation starting from a variable. In deep learning, this variable often holds the value of the cost function. Backward executes the backward pass and computes all the backpropagation gradients automatically.
PyTorch generates derivatives by building a backwards graph behind the scenes, while tensors and backwards functions are the graph's nodes. In a graph, PyTorch computes the derivative of a tensor depending on whether it is a leaf or not.
Aren't these two the same operations?
No. While they produce effectively the same tensor, the operations are not the same, and they are not guaranteed to have the same storage
.
TensorShape.cpp:
// _unsafe_view() differs from view() in that the returned tensor isn't treated
// as a view for the purposes of automatic differentiation. (It's not listed in
// VIEW_FUNCTIONS in gen_autograd.py). It's only safe to use if the `self` tensor
// is temporary. For example, the viewed tensor here (a + b) is discarded immediately
// after viewing:
//
// res = at::_unsafe_view(a + b, size);
//
// This is a hack because in-place operations on tensors treated like views
// can be much more expensive than the same operations on non-view tensors.
Note this can produce an error if applied to complex inputs, but this is generally not yet fully supported in PyTorch and not unique to this function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With