Background I'm trying to stick within the torch framework to ensure that if the data structures worked with are in the GPU then it is all in the GPU and vice versa so that I don't mix host and device level variables.
The Problem
So I want to define a variable, or a small vector containing dimensional values. I have a torch.Tensor containing data, let us call it data. So if I write data.shape it returns
torch.Size([1, 2000, 3000])
I want to store this information in another torch.Tensor object so I write:
dimensional_tensor = torch.Tensor(data.shape)
The problem is that it doesn't store those values, instead it generates a Tensor object with what looks like pseudorandom numbers with the same shape as indicated in data.shape, i.e, I get this output if I write dimensional_tensor:
tensor([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]])
It looks like a zero tensor from the output but if I write torch.unique(dimensional_tensor) it will yield non-zero elements.
If I write
dimensional_tensor = torch.Tensor(list(data.shape))
or even
dimensional_tensor = torch.tensor(data.shape)
then it does what I want. What is up with that?
I'm using pytorch 2.1.0. I don't know how to look into the python source code of this and figure out why this is happening. I suspect it may be because torch is binary level code.
Summary
Expected Result
As I mentioned above I expect the call torch.Tensor(data.shape) to yield a torch.Tensor containing the numerical values of the data.shape call.
Actual Result
A torch.Tensor object with dimension data.shape seemingly filled with pseudorandom numbers.
I don't have a problem with this if this is the way it should be. I merely want to understand the rationale behind this behavior and I'm unable to find any documentation that would support/explain such behavior. Perhaps this is a bug?
The problem I have with this is that it is undefined and apparently not supported in the documentation which may lead to risk of breaking the code when switching to future updates of pytorch if I would write applications that rely on this.
Short answer: The torch.Tensor constructor is overloaded to do the same thing as both torch.tensor and torch.empty.
torch.empty returns a tensor filled with uninitialized data. The shape of the tensor is defined by the variable argument size. So when you call torch.Tensor with torch.Size it is expected behaviour to get seemingly random (uninitialized) data with shape defined by the torch.Size object.
There is a great comment about this on pytorch forum by a developer, which also explains why is it this way:
Our torch.Tensor constructor is overloaded to do the same thing as both torch.tensor and torch.empty. We thought this overload would make code confusing, so we split torch.Tensor into torch.tensor and torch.empty. So @yxchng yes, to some extent, torch.tensor works similarly to torch.Tensor (when you pass in data). @ProGamerGov no, neither should be more efficient than the other. It’s just that the torch.empty and torch.tensor have a nicer API than our legacy torch.Tensor constructor.
You are right that this should be documented, however it is mentioned that
To create a tensor with pre-existing data, use torch.tensor().
Similar question: What is the difference between torch.tensor and torch.Tensor?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With