I have the following setup:
a.py
:
class A(object):
def __init__(self, name):
self.name = name
def a(self):
print('yow {}!'.format(self.name))
b.py
:
class B(object):
def __init__(self, obj):
self.obj = obj
sender.py
:
from a import A
from b import B
message = pickle.dumps(B(A('Martin')))
receiver.py
:
my_b = pickle.loads(message)
my_a = my_b.obj
my_a.a()
Output: yow Martin!
In sender.py
I pickle the object b
which acts as a carrier for the object a
. Then I send that pickled object b
via RabbitMQ to another process. In receiver.py
(which is another process) I get a message via RabbitMQ, unpickle object b
and by magic B
and A
get imported automatically. Can I control what gets imported? I would like for worker receiver.py
to consume as little memory as possible. But if modules get imported without my control it can get bloated very quickly.
Could someone explain how pickle imports stuff and what to do about it?
What kind of control is required? As you can see from the source, when you are running pickle.loads(content)
it actually does:
def loads(str):
file = StringIO(str)
return Unpickler(file).load()
Then there is some magic. It reads a string as a file and dispatches its' content is based on specific keys:
GLOBAL = 'c' # push self.find_class(modname, name); 2 string args
INST = 'i' # build & push class instance
Loading function itself:
def load(self):
"""Read a pickled object representation from the open file.
Return the reconstituted object hierarchy specified in the file.
"""
...
read = self.read # self.read = file.read, which is StringIO's read()
dispatch = self.dispatch
try:
while 1:
key = read(1)
dispatch[key](self) # this function call makes a future import.
except _Stop, stopinst:
return stopinst.value
You are intrested in method find_class()
, which is used in several other load functions
(load_inst()
and load_global()
):
def find_class(self, module, name):
# Subclasses may override this:
__import__(module) # straight-forward import, you can ovveride it.
mod = sys.modules[module]
klass = getattr(mod, name)
return klass
For example, load_inst()
function:
def load_inst(self):
module = self.readline()[:-1]
name = self.readline()[:-1]
klass = self.find_class(module, name)
# Now module is imported and ready to be used:
self._instantiate(klass, self.marker())
dispatch[INST] = load_inst
So, if you want to control namespaces or modules, which can be imported, you will need to subclass Unpickler
and override find_class()
to fit your goals. Was my answer helpful to you?
It uses the __module__
attribute of A
and B
:
>>> A.__module__
'a'
>>> __import__(A.__module__)
<module 'a' from 'a.py'>
If you want to control what is imported, you can structure your python packages so that from a import A
doesn't load too many objects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With