Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to control what gets imported when you unpickle python object?

I have the following setup:

a.py:

class A(object):
    def __init__(self, name):
        self.name = name
    def a(self):
        print('yow {}!'.format(self.name))

b.py:

class B(object):
    def __init__(self, obj):
        self.obj = obj

sender.py:

from a import A
from b import B
message = pickle.dumps(B(A('Martin')))

receiver.py:

my_b = pickle.loads(message)
my_a = my_b.obj
my_a.a()

Output: yow Martin!

In sender.py I pickle the object b which acts as a carrier for the object a. Then I send that pickled object b via RabbitMQ to another process. In receiver.py (which is another process) I get a message via RabbitMQ, unpickle object b and by magic B and A get imported automatically. Can I control what gets imported? I would like for worker receiver.py to consume as little memory as possible. But if modules get imported without my control it can get bloated very quickly.

Could someone explain how pickle imports stuff and what to do about it?

like image 425
ragezor Avatar asked Sep 13 '25 22:09

ragezor


2 Answers

What kind of control is required? As you can see from the source, when you are running pickle.loads(content) it actually does:

def loads(str):
    file = StringIO(str)
    return Unpickler(file).load()

Then there is some magic. It reads a string as a file and dispatches its' content is based on specific keys:

GLOBAL          = 'c'   # push self.find_class(modname, name); 2 string args
INST            = 'i'   # build & push class instance

Loading function itself:

def load(self):
    """Read a pickled object representation from the open file.
    Return the reconstituted object hierarchy specified in the file.
    """
    ...
    read = self.read  # self.read = file.read, which is StringIO's read()
    dispatch = self.dispatch
    try:
        while 1:
            key = read(1)
            dispatch[key](self) # this function call makes a future import.
   except _Stop, stopinst:
       return stopinst.value

You are intrested in method find_class(), which is used in several other load functions (load_inst() and load_global()):

def find_class(self, module, name):
    # Subclasses may override this:
    __import__(module)  # straight-forward import, you can ovveride it.
    mod = sys.modules[module]
    klass = getattr(mod, name)
    return klass

For example, load_inst() function:

def load_inst(self):
    module = self.readline()[:-1]
    name = self.readline()[:-1]
    klass = self.find_class(module, name)
    # Now module is imported and ready to be used:
    self._instantiate(klass, self.marker())
dispatch[INST] = load_inst

So, if you want to control namespaces or modules, which can be imported, you will need to subclass Unpickler and override find_class() to fit your goals. Was my answer helpful to you?

like image 118
sobolevn Avatar answered Sep 15 '25 10:09

sobolevn


It uses the __module__ attribute of A and B:

>>> A.__module__
'a'
>>> __import__(A.__module__)
<module 'a' from 'a.py'>

If you want to control what is imported, you can structure your python packages so that from a import A doesn't load too many objects.

like image 41
Vincent Avatar answered Sep 15 '25 10:09

Vincent