Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pytorch tensor pickling inconsistent between runs

I'm trying to unpickle a pytorch tensor, but pickling it back yields different results across runs:

>>> import pickle

>>> tensor1 = pickle.load(f) # I cannot reproduce the issue with some minimal manually-created tensor, only with this specific file
>>> tensor2 = pickle.load(f)
>>> pickled_tensor1 = pickle.dumps(tensor1) 
>>> pickled_tensor2 = pickle.dumps(tensor2)
>>> pickled_tensor1 == pickled_tensor2
False

Below are the values of pickled_tensor1 and pickled_tensor2 respectively:

b'\x80\x04\x95\x98\x01\x00\x00\x00\x00\x00\x00\x8c\x0ctorch._utils\x94\x8c\x12_rebuild_tensor_v2\x94\x93\x94(\x8c\rtorch.storage\x94\x8c\x10_load_from_bytes\x94\x93\x94B\r\x01\x00\x00\x80\x02\x8a\nl\xfc\x9cF\xf9 j\xa8P\x19.\x80\x02M\xe9\x03.\x80\x02}q\x00(X\x10\x00\x00\x00protocol_versionq\x01M\xe9\x03X\r\x00\x00\x00little_endianq\x02\x88X\n\x00\x00\x00type_sizesq\x03}q\x04(X\x05\x00\x00\x00shortq\x05K\x02X\x03\x00\x00\x00intq\x06K\x04X\x04\x00\x00\x00longq\x07K\x04uu.\x80\x02(X\x07\x00\x00\x00storageq\x00ctorch\nFloatStorage\nq\x01X\x0f\x00\x00\x00140382183041680q\x02X\x03\x00\x00\x00cpuq\x03K\x04Ntq\x04Q.\x80\x02]q\x00X\x0f\x00\x00\x00140382183041680q\x01a.\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x94\x85\x94R\x94K\x00K\x02K\x02\x86\x94K\x02K\x01\x86\x94\x89\x8c\x0bcollections\x94\x8c\x0bOrderedDict\x94\x93\x94)R\x94t\x94R\x94.'
b'\x80\x04\x95\x98\x01\x00\x00\x00\x00\x00\x00\x8c\x0ctorch._utils\x94\x8c\x12_rebuild_tensor_v2\x94\x93\x94(\x8c\rtorch.storage\x94\x8c\x10_load_from_bytes\x94\x93\x94B\r\x01\x00\x00\x80\x02\x8a\nl\xfc\x9cF\xf9 j\xa8P\x19.\x80\x02M\xe9\x03.\x80\x02}q\x00(X\x10\x00\x00\x00protocol_versionq\x01M\xe9\x03X\r\x00\x00\x00little_endianq\x02\x88X\n\x00\x00\x00type_sizesq\x03}q\x04(X\x05\x00\x00\x00shortq\x05K\x02X\x03\x00\x00\x00intq\x06K\x04X\x04\x00\x00\x00longq\x07K\x04uu.\x80\x02(X\x07\x00\x00\x00storageq\x00ctorch\nFloatStorage\nq\x01X\x0f\x00\x00\x00140382172016592q\x02X\x03\x00\x00\x00cpuq\x03K\x04Ntq\x04Q.\x80\x02]q\x00X\x0f\x00\x00\x00140382172016592q\x01a.\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x94\x85\x94R\x94K\x00K\x02K\x02\x86\x94K\x02K\x01\x86\x94\x89\x8c\x0bcollections\x94\x8c\x0bOrderedDict\x94\x93\x94)R\x94t\x94R\x94.'

My question is why is it happening and how can I prevent this?

I am using Python 3.8; pytorch 1.7.0

Cheers, Hlib.

like image 760
Hlib Babii Avatar asked May 04 '26 07:05

Hlib Babii


1 Answers

If you compare the tensors, you see that they are the same values, so the pickling process is working fine.

What is happening? Just try to change one of the numbers that differ and you'll get an error when asserting the key is in the deserialized_storage_keys. That tells you that these numbers are object keys and generated by pickle.

Can you avoid this? you can use torch.save, and saves to file or a BytesIO buffer that you can read if you wanted a byte-string

import torch
import pickle
from io import BytesIO

pt1 = b'\x80\x04\x95\x98\x01\x00\x00\x00\x00\x00\x00\x8c\x0ctorch._utils\x94\x8c\x12_rebuild_tensor_v2\x94\x93\x94(\x8c\rtorch.storage\x94\x8c\x10_load_from_bytes\x94\x93\x94B\r\x01\x00\x00\x80\x02\x8a\nl\xfc\x9cF\xf9 j\xa8P\x19.\x80\x02M\xe9\x03.\x80\x02}q\x00(X\x10\x00\x00\x00protocol_versionq\x01M\xe9\x03X\r\x00\x00\x00little_endianq\x02\x88X\n\x00\x00\x00type_sizesq\x03}q\x04(X\x05\x00\x00\x00shortq\x05K\x02X\x03\x00\x00\x00intq\x06K\x04X\x04\x00\x00\x00longq\x07K\x04uu.\x80\x02(X\x07\x00\x00\x00storageq\x00ctorch\nFloatStorage\nq\x01X\x0f\x00\x00\x00140382183041680q\x02X\x03\x00\x00\x00cpuq\x03K\x04Ntq\x04Q.\x80\x02]q\x00X\x0f\x00\x00\x00140382183041680q\x01a.\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x94\x85\x94R\x94K\x00K\x02K\x02\x86\x94K\x02K\x01\x86\x94\x89\x8c\x0bcollections\x94\x8c\x0bOrderedDict\x94\x93\x94)R\x94t\x94R\x94.'
pt2 = b'\x80\x04\x95\x98\x01\x00\x00\x00\x00\x00\x00\x8c\x0ctorch._utils\x94\x8c\x12_rebuild_tensor_v2\x94\x93\x94(\x8c\rtorch.storage\x94\x8c\x10_load_from_bytes\x94\x93\x94B\r\x01\x00\x00\x80\x02\x8a\nl\xfc\x9cF\xf9 j\xa8P\x19.\x80\x02M\xe9\x03.\x80\x02}q\x00(X\x10\x00\x00\x00protocol_versionq\x01M\xe9\x03X\r\x00\x00\x00little_endianq\x02\x88X\n\x00\x00\x00type_sizesq\x03}q\x04(X\x05\x00\x00\x00shortq\x05K\x02X\x03\x00\x00\x00intq\x06K\x04X\x04\x00\x00\x00longq\x07K\x04uu.\x80\x02(X\x07\x00\x00\x00storageq\x00ctorch\nFloatStorage\nq\x01X\x0f\x00\x00\x00140382172016592q\x02X\x03\x00\x00\x00cpuq\x03K\x04Ntq\x04Q.\x80\x02]q\x00X\x0f\x00\x00\x00140382172016592q\x01a.\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x94\x85\x94R\x94K\x00K\x02K\x02\x86\x94K\x02K\x01\x86\x94\x89\x8c\x0bcollections\x94\x8c\x0bOrderedDict\x94\x93\x94)R\x94t\x94R\x94.'

t1 = pickle.loads(pt1)
t2 = pickle.loads(pt2)

b1 = BytesIO()
b2 = BytesIO()

torch.save(t1, b1) 
torch.save(t2, b2) 

b1.seek(0)
pt1b = b1.read()
b2.seek(0)
pt2b = b2.read()

pt1b == pt2b

returns

True

Note that the size of the resulting strings is somewhat larger (419 vs 747)

torch.save also takes additional parameters, like the pickle_module to use, or the pickle_protocol

See: https://pytorch.org/docs/stable/generated/torch.save.html for more details

like image 100
MrE Avatar answered May 05 '26 21:05

MrE



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!