Pytorch tensor pickling inconsistent between runs

Question

I'm trying to unpickle a pytorch tensor, but pickling it back yields different results across runs:

>>> import pickle

>>> tensor1 = pickle.load(f) # I cannot reproduce the issue with some minimal manually-created tensor, only with this specific file
>>> tensor2 = pickle.load(f)
>>> pickled_tensor1 = pickle.dumps(tensor1) 
>>> pickled_tensor2 = pickle.dumps(tensor2)
>>> pickled_tensor1 == pickled_tensor2
False

Below are the values of pickled_tensor1 and pickled_tensor2 respectively:

b'\x80\x04\x95\x98\x01\x00\x00\x00\x00\x00\x00\x8c\x0ctorch._utils\x94\x8c\x12_rebuild_tensor_v2\x94\x93\x94(\x8c
torch.storage\x94\x8c\x10_load_from_bytes\x94\x93\x94B
\x01\x00\x00\x80\x02\x8a
l\xfc\x9cF\xf9 j\xa8P\x19.\x80\x02M\xe9\x03.\x80\x02}q\x00(X\x10\x00\x00\x00protocol_versionq\x01M\xe9\x03X
\x00\x00\x00little_endianq\x02\x88X
\x00\x00\x00type_sizesq\x03}q\x04(X\x05\x00\x00\x00shortq\x05K\x02X\x03\x00\x00\x00intq\x06K\x04X\x04\x00\x00\x00longq\x07K\x04uu.\x80\x02(X\x07\x00\x00\x00storageq\x00ctorch
FloatStorage
q\x01X\x0f\x00\x00\x00140382183041680q\x02X\x03\x00\x00\x00cpuq\x03K\x04Ntq\x04Q.\x80\x02]q\x00X\x0f\x00\x00\x00140382183041680q\x01a.\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x94\x85\x94R\x94K\x00K\x02K\x02\x86\x94K\x02K\x01\x86\x94\x89\x8c\x0bcollections\x94\x8c\x0bOrderedDict\x94\x93\x94)R\x94t\x94R\x94.'
b'\x80\x04\x95\x98\x01\x00\x00\x00\x00\x00\x00\x8c\x0ctorch._utils\x94\x8c\x12_rebuild_tensor_v2\x94\x93\x94(\x8c
torch.storage\x94\x8c\x10_load_from_bytes\x94\x93\x94B
\x01\x00\x00\x80\x02\x8a
l\xfc\x9cF\xf9 j\xa8P\x19.\x80\x02M\xe9\x03.\x80\x02}q\x00(X\x10\x00\x00\x00protocol_versionq\x01M\xe9\x03X
\x00\x00\x00little_endianq\x02\x88X
\x00\x00\x00type_sizesq\x03}q\x04(X\x05\x00\x00\x00shortq\x05K\x02X\x03\x00\x00\x00intq\x06K\x04X\x04\x00\x00\x00longq\x07K\x04uu.\x80\x02(X\x07\x00\x00\x00storageq\x00ctorch
FloatStorage
q\x01X\x0f\x00\x00\x00140382172016592q\x02X\x03\x00\x00\x00cpuq\x03K\x04Ntq\x04Q.\x80\x02]q\x00X\x0f\x00\x00\x00140382172016592q\x01a.\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x94\x85\x94R\x94K\x00K\x02K\x02\x86\x94K\x02K\x01\x86\x94\x89\x8c\x0bcollections\x94\x8c\x0bOrderedDict\x94\x93\x94)R\x94t\x94R\x94.'

My question is why is it happening and how can I prevent this?

I am using Python 3.8; pytorch 1.7.0

Cheers, Hlib.

MrE · Accepted Answer

If you compare the tensors, you see that they are the same values, so the pickling process is working fine.

What is happening? Just try to change one of the numbers that differ and you'll get an error when asserting the key is in the deserialized_storage_keys. That tells you that these numbers are object keys and generated by pickle.

Can you avoid this? you can use torch.save, and saves to file or a BytesIO buffer that you can read if you wanted a byte-string

import torch
import pickle
from io import BytesIO

pt1 = b'\x80\x04\x95\x98\x01\x00\x00\x00\x00\x00\x00\x8c\x0ctorch._utils\x94\x8c\x12_rebuild_tensor_v2\x94\x93\x94(\x8c
torch.storage\x94\x8c\x10_load_from_bytes\x94\x93\x94B
\x01\x00\x00\x80\x02\x8a
l\xfc\x9cF\xf9 j\xa8P\x19.\x80\x02M\xe9\x03.\x80\x02}q\x00(X\x10\x00\x00\x00protocol_versionq\x01M\xe9\x03X
\x00\x00\x00little_endianq\x02\x88X
\x00\x00\x00type_sizesq\x03}q\x04(X\x05\x00\x00\x00shortq\x05K\x02X\x03\x00\x00\x00intq\x06K\x04X\x04\x00\x00\x00longq\x07K\x04uu.\x80\x02(X\x07\x00\x00\x00storageq\x00ctorch
FloatStorage
q\x01X\x0f\x00\x00\x00140382183041680q\x02X\x03\x00\x00\x00cpuq\x03K\x04Ntq\x04Q.\x80\x02]q\x00X\x0f\x00\x00\x00140382183041680q\x01a.\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x94\x85\x94R\x94K\x00K\x02K\x02\x86\x94K\x02K\x01\x86\x94\x89\x8c\x0bcollections\x94\x8c\x0bOrderedDict\x94\x93\x94)R\x94t\x94R\x94.'
pt2 = b'\x80\x04\x95\x98\x01\x00\x00\x00\x00\x00\x00\x8c\x0ctorch._utils\x94\x8c\x12_rebuild_tensor_v2\x94\x93\x94(\x8c
torch.storage\x94\x8c\x10_load_from_bytes\x94\x93\x94B
\x01\x00\x00\x80\x02\x8a
l\xfc\x9cF\xf9 j\xa8P\x19.\x80\x02M\xe9\x03.\x80\x02}q\x00(X\x10\x00\x00\x00protocol_versionq\x01M\xe9\x03X
\x00\x00\x00little_endianq\x02\x88X
\x00\x00\x00type_sizesq\x03}q\x04(X\x05\x00\x00\x00shortq\x05K\x02X\x03\x00\x00\x00intq\x06K\x04X\x04\x00\x00\x00longq\x07K\x04uu.\x80\x02(X\x07\x00\x00\x00storageq\x00ctorch
FloatStorage
q\x01X\x0f\x00\x00\x00140382172016592q\x02X\x03\x00\x00\x00cpuq\x03K\x04Ntq\x04Q.\x80\x02]q\x00X\x0f\x00\x00\x00140382172016592q\x01a.\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\x94\x85\x94R\x94K\x00K\x02K\x02\x86\x94K\x02K\x01\x86\x94\x89\x8c\x0bcollections\x94\x8c\x0bOrderedDict\x94\x93\x94)R\x94t\x94R\x94.'

t1 = pickle.loads(pt1)
t2 = pickle.loads(pt2)

b1 = BytesIO()
b2 = BytesIO()

torch.save(t1, b1) 
torch.save(t2, b2) 

b1.seek(0)
pt1b = b1.read()
b2.seek(0)
pt2b = b2.read()

pt1b == pt2b

returns

True

Note that the size of the resulting strings is somewhat larger (419 vs 747)

torch.save also takes additional parameters, like the pickle_module to use, or the pickle_protocol

See: https://pytorch.org/docs/stable/generated/torch.save.html for more details

Pytorch tensor pickling inconsistent between runs

Tags:

python

pickle

pytorch

Hlib Babii

1 Answers

MrE

Recent Activity

Donate For Us

Pytorch tensor pickling inconsistent between runs

Tags:

python

pickle

pytorch

Hlib Babii

1 Answers

MrE

Related questions

Recent Activity

Donate For Us