In a post I posted yesterday, I accidentally found changing the __qualname__
of a function has an unexpected effect on pickle
. By running more tests, I found that when pickling a function, pickle
does not work in the way I thought, and changing the __qualname__
of the function has a real effect on how pickle
behaves.
The snippets below are tests I ran,
import pickle
from sys import modules
# a simple function to pickle
def hahaha(): return 1
print('hahaha',hahaha,'\n')
# change the __qualname__ of function hahaha
hahaha.__qualname__ = 'sdfsdf'
print('set hahaha __qualname__ to sdfsdf',hahaha,'\n')
# make a copy of hahaha
setattr(modules['__main__'],'abcabc',hahaha)
print('create abcabc which is just hahaha',abcabc,'\n')
try:
pickle.dumps(hahaha)
except Exception as e:
print('pickle hahaha')
print(e,'\n')
try:
pickle.dumps(abcabc)
except Exception as e:
print('pickle abcabc, a copy of hahaha')
print(e,'\n')
try:
pickle.dumps(sdfsdf)
except Exception as e:
print('pickle sdfsdf')
print(e)
As you can see by running the snippets, both hahaha
and abcabc
cannot be pickled because of the exception:
Can't pickle <function sdfsdf at 0x7fda36dc5f28>: attribute lookup sdfsdf on __main__ failed
.
I'm really confused by this exception,
What does pickle
look for when it pickles a function? Although the __qualname__
of hahaha
was changed to 'sdfsdf', the function hahaha
as well as its copy abcabc
is still callable in the session (as they are in dir(sys.modules['__main__'])
), then why pickle
cannot pickle them?
What is the real effect of changing the __qualname__
of a function? I understand by changing the __qualname__
of hahaha
to 'sdfsdf' won't make sdfsdf
callable, as it won't show up in dir(sys.modules['__main__'])
. However, as you can see by running the snippets, after changing the __qualname__
of hahaha
to 'sdfsdf', the object hahaha
as well as its copy abcabc
has changed to something like <function sdfsdf at 'some_address'>
. What is the difference between the objects in sys.modules['__main__']
and <function sdfsdf at 'some_address'>
?
Pickle is used for serializing and de-serializing Python object structures, also called marshalling or flattening. Serialization refers to the process of converting an object in memory to a byte stream that can be stored on disk or sent over a network.
“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.
First, import pickle to use it, then we define an example dictionary, which is a Python object. Next, we open a file (note that we open to write bytes in Python 3+), then we use pickle. dump() to put the dict into opened file, then close. Use pickle.
To save the ML model using Pickle all we need to do is pass the model object into the dump() function of Pickle. This will serialize the object and convert it into a “byte stream” that we can save as a file called model. pkl .
Pickling of function objects is defined in the save_global
method in pickle.py:
First, the name of the function is retrieved via __qualname__
:
name = getattr(obj, '__qualname__', None)
Afterwards, after retrieving the module, it is reimported:
__import__(module_name, level=0)
module = sys.modules[module_name]
This freshly imported module
is then used to look up the function as an attribute:
obj2, parent = _getattribute(module, name)
obj2
would now be a new copy of the function, but since sdfsdf
doesn't exist in this module, pickling fails here.
You can make this work, but you have to be consistent:
>>> import sys
>>> import pickle
>>> def hahaha(): return 1
>>> hahaha.__qualname__ = "sdfsdf"
>>> setattr(sys.modules["__main__"], "sdfsdf", hahaha)
>>> pickle.dumps(hahaha)
b'\x80\x04\x95\x17\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x06sdfsdf\x94\x93\x94.'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With