I'm trying to have an in-depth understanding of how torch.from_numpy() works.
import numpy as np
import torch
arr = np.zeros((3, 3), dtype=np.float32)
t = torch.from_numpy(arr)
print("arr: {0}\nt: {1}\n".format(arr, t))
arr[0,0]=1
print("arr: {0}\nt: {1}\n".format(arr, t))
print("id(arr): {0}\nid(t): {1}".format(id(arr), id(t)))
The output looks like this:
arr: [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
t: tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])
arr: [[1. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
t: tensor([[1., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])
id(arr): 2360964353040
id(t): 2360964352984
This is part of the doc from torch.from_numpy():
from_numpy(ndarray) -> Tensor
Creates a :class:
Tensorfrom a :class:numpy.ndarray.The returned tensor and :attr:
ndarrayshare the same memory. Modifications to the tensor will be reflected in the :attr:ndarrayand vice versa. The returned tensor is not resizable.
And this is taken from the doc of id():
Return the identity of an object.
This is guaranteed to be unique among simultaneously existing objects. (CPython uses the object's memory address.)
So here comes the question: 
Since the ndarray arr and tensor t share the same memory, why do they have different memory addresses?
Any ideas/suggestions?
Yes, t and arr are different Python objects at different regions of memory (hence different id) but both point to the same memory address which contains the data (contiguous (usually) C array).
numpy operates on this region using C code binded to Python functions, same goes for torch (but using C++). id() doesn't know anything about the memory address of data itself, only of it's "wrappers". 
EDIT: When you assign b = a (assuming a is np.array), both are references to the same Python wrapper (np.ndarray). In other words they are the same object with different name. 
It's just how Python's assignment works, see documentation. All of the cases below would return True as well:
import torch
import numpy as np
tensor = torch.tensor([1,2,3])
tensor2 = tensor
id(tensor) == id(tensor2)
arr = np.array([1, 2, 3, 4, 5])
arr2 = arr
id(arr) == id(arr2)
some_str = "abba"
other_str = some_str
id(some_str) == id(other_str)
value = 0
value2 = value
id(value) == id(value2)
Now, when you use torch.from_numpy on np.ndarray you have two objects of different classes (torch.Tensor and original np.ndarray). As those are of different types they can't have the same id. One could see this case as analogous to the one below:
value = 3
string_value = str(3)
id(value) == id(string_value)
Here it's intuitive both string_value and value are two different objects at different memory locations.
EDIT 2:
All in all concepts of Python object and underlying C array have to be separated. id() doesn't know about C bindings (how could it?), but it knows about memory addresses of Python structures (torch.Tensor, np.ndarray). 
In case of numpy and torch.tensor you can have following situations:
torch.from_numpy)torch.tensor and another np.array). Could be created by from_numpy followed by clone() or a-like deep copy operation.torch.tensor objects, one referencing another as provided above)If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With