I'm trying to unpickle a pytorch tensor, but pickling it back yields different results across runs:
>>> import pickle
>>> tensor1 = pickle.load(f) # I cannot reproduce the issue with some minimal manually-created tensor, only with this specific file
>>> tensor2 = pickle.load(f)
>>> pickled_tensor1 = pickle.dumps(tensor1)
>>> pickled_tensor2 = pickle.dumps(tensor2)
>>> pickled_tensor1 == pickled_tensor2
False
Below are the values of pickled_tensor1 and pickled_tensor2 respectively:
b'\x80\x04\x95\x98\x01\x00\x00\x00\x00\x00\x00\x8c\x0ctorch._utils\x94\x8c\x12_rebuild_tensor_v2\x94\x93\x94(\x8c\rtorch.storage\x94\x8c\x10_load_from_bytes\x94\x93\x94B\r\x01\x00\x00\x80\x02\x8a\nl\xfc\x9cF\xf9 j\xa8P\x19.\x80\x02M\xe9\x03.\x80\x02}q\x00(X\x10\x00\x00\x00protocol_versionq\x01M\xe9\x03X\r\x00\x00\x00little_endianq\x02\x88X\n\x00\x00\x00type_sizesq\x03}q\x04(X\x05\x00\x00\x00shortq\x05K\x02X\x03\x00\x00\x00intq\x06K\x04X\x04\x00\x00\x00longq\x07K\x04uu.\x80\x02(X\x07\x00\x00\x00storageq\x00ctorch\nFloatStorage\nq\x01X\x0f\x00\x00\x00140382183041680q\x02X\x03\x00\x00\x00cpuq\x03K\x04Ntq\x04Q.\x80\x02]q\x00X\x0f\x00\x00\x00140382183041680q\x01a.\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x94\x85\x94R\x94K\x00K\x02K\x02\x86\x94K\x02K\x01\x86\x94\x89\x8c\x0bcollections\x94\x8c\x0bOrderedDict\x94\x93\x94)R\x94t\x94R\x94.'
b'\x80\x04\x95\x98\x01\x00\x00\x00\x00\x00\x00\x8c\x0ctorch._utils\x94\x8c\x12_rebuild_tensor_v2\x94\x93\x94(\x8c\rtorch.storage\x94\x8c\x10_load_from_bytes\x94\x93\x94B\r\x01\x00\x00\x80\x02\x8a\nl\xfc\x9cF\xf9 j\xa8P\x19.\x80\x02M\xe9\x03.\x80\x02}q\x00(X\x10\x00\x00\x00protocol_versionq\x01M\xe9\x03X\r\x00\x00\x00little_endianq\x02\x88X\n\x00\x00\x00type_sizesq\x03}q\x04(X\x05\x00\x00\x00shortq\x05K\x02X\x03\x00\x00\x00intq\x06K\x04X\x04\x00\x00\x00longq\x07K\x04uu.\x80\x02(X\x07\x00\x00\x00storageq\x00ctorch\nFloatStorage\nq\x01X\x0f\x00\x00\x00140382172016592q\x02X\x03\x00\x00\x00cpuq\x03K\x04Ntq\x04Q.\x80\x02]q\x00X\x0f\x00\x00\x00140382172016592q\x01a.\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x94\x85\x94R\x94K\x00K\x02K\x02\x86\x94K\x02K\x01\x86\x94\x89\x8c\x0bcollections\x94\x8c\x0bOrderedDict\x94\x93\x94)R\x94t\x94R\x94.'
My question is why is it happening and how can I prevent this?
I am using Python 3.8; pytorch 1.7.0
Cheers, Hlib.
If you compare the tensors, you see that they are the same values, so the pickling process is working fine.
What is happening? Just try to change one of the numbers that differ and you'll get an error when asserting the key is in the deserialized_storage_keys. That tells you that these numbers are object keys and generated by pickle.
Can you avoid this? you can use torch.save, and saves to file or a BytesIO buffer that you can read if you wanted a byte-string
import torch
import pickle
from io import BytesIO
pt1 = b'\x80\x04\x95\x98\x01\x00\x00\x00\x00\x00\x00\x8c\x0ctorch._utils\x94\x8c\x12_rebuild_tensor_v2\x94\x93\x94(\x8c\rtorch.storage\x94\x8c\x10_load_from_bytes\x94\x93\x94B\r\x01\x00\x00\x80\x02\x8a\nl\xfc\x9cF\xf9 j\xa8P\x19.\x80\x02M\xe9\x03.\x80\x02}q\x00(X\x10\x00\x00\x00protocol_versionq\x01M\xe9\x03X\r\x00\x00\x00little_endianq\x02\x88X\n\x00\x00\x00type_sizesq\x03}q\x04(X\x05\x00\x00\x00shortq\x05K\x02X\x03\x00\x00\x00intq\x06K\x04X\x04\x00\x00\x00longq\x07K\x04uu.\x80\x02(X\x07\x00\x00\x00storageq\x00ctorch\nFloatStorage\nq\x01X\x0f\x00\x00\x00140382183041680q\x02X\x03\x00\x00\x00cpuq\x03K\x04Ntq\x04Q.\x80\x02]q\x00X\x0f\x00\x00\x00140382183041680q\x01a.\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x94\x85\x94R\x94K\x00K\x02K\x02\x86\x94K\x02K\x01\x86\x94\x89\x8c\x0bcollections\x94\x8c\x0bOrderedDict\x94\x93\x94)R\x94t\x94R\x94.'
pt2 = b'\x80\x04\x95\x98\x01\x00\x00\x00\x00\x00\x00\x8c\x0ctorch._utils\x94\x8c\x12_rebuild_tensor_v2\x94\x93\x94(\x8c\rtorch.storage\x94\x8c\x10_load_from_bytes\x94\x93\x94B\r\x01\x00\x00\x80\x02\x8a\nl\xfc\x9cF\xf9 j\xa8P\x19.\x80\x02M\xe9\x03.\x80\x02}q\x00(X\x10\x00\x00\x00protocol_versionq\x01M\xe9\x03X\r\x00\x00\x00little_endianq\x02\x88X\n\x00\x00\x00type_sizesq\x03}q\x04(X\x05\x00\x00\x00shortq\x05K\x02X\x03\x00\x00\x00intq\x06K\x04X\x04\x00\x00\x00longq\x07K\x04uu.\x80\x02(X\x07\x00\x00\x00storageq\x00ctorch\nFloatStorage\nq\x01X\x0f\x00\x00\x00140382172016592q\x02X\x03\x00\x00\x00cpuq\x03K\x04Ntq\x04Q.\x80\x02]q\x00X\x0f\x00\x00\x00140382172016592q\x01a.\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x94\x85\x94R\x94K\x00K\x02K\x02\x86\x94K\x02K\x01\x86\x94\x89\x8c\x0bcollections\x94\x8c\x0bOrderedDict\x94\x93\x94)R\x94t\x94R\x94.'
t1 = pickle.loads(pt1)
t2 = pickle.loads(pt2)
b1 = BytesIO()
b2 = BytesIO()
torch.save(t1, b1)
torch.save(t2, b2)
b1.seek(0)
pt1b = b1.read()
b2.seek(0)
pt2b = b2.read()
pt1b == pt2b
returns
True
Note that the size of the resulting strings is somewhat larger (419 vs 747)
torch.save also takes additional parameters, like the pickle_module to use, or the pickle_protocol
See: https://pytorch.org/docs/stable/generated/torch.save.html for more details
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With