Unpickling a function into a different context in Python

Tags:

I have written a Python interface to a process-centric job distribution system we're developing/using internally at my workplace. While reasonably skilled programmers, the primary people using this interface are research scientists, not software developers, so ease-of-use and keeping the interface out of the way to the greatest degree possible is paramount.

My library unrolls a sequence of inputs into a sequence of pickle files on a shared file server, then spawns jobs that load those inputs, perform the computation, pickle the results, and exit; the client script then picks back up and produces a generator that loads and yields the results (or rethrows any exception the calculation function did.)

This is only useful since the calculation function itself is one of the serialized inputs. cPickle is quite content to pickle function references, but requires the pickled function to be reimportable in the same context. This is problematic. I've already solved the problem of finding the module to reimport it, but the vast majority of the time, it is a top-level function that is pickled and, thus, does not have a module path. The only strategy I've found to be able to unpickle such a function on the computation nodes is this nauseating little approach towards simulating the original environment in which the function was pickled before unpickling it:

...
# At this point, we've identified the source of the target function.
# A string by its name lives in "modname".
# In the real code, there is significant try/except work here.

targetModule = __import__(modname)
globalRef = globals()
for thingie in dir(targetModule):
    if thingie not in globalRef:
        globalRef[thingie] = targetModule.__dict__[thingie]

# sys.argv[2]: the path to the pickle file common to all jobs, which contains
# any data in common to all invocations of the target function, then the
# target function itself
commonFile = open(sys.argv[2], "rb")
commonUnpickle = cPickle.Unpickler(commonFile)
commonData = commonUnpickle.load()
# the actual function unpack I'm having trouble with:
doIt = commonUnpickle.load()

The final line is the most important one here- it's where my module is picking up the function it should actually be running. This code, as written, works as desired, but directly manipulating the symbol tables like this is unsettling.

How can I do this, or something very much like this that does not force the research scientists to separate their calculation scripts into a proper class structure (they use Python like the most excellent graphing calculator ever and I would like to continue to let them do so) the way Pickle desperately wants, without the unpleasant, unsafe, and just plain scary __dict__-and-globals() manipulation I'm using above? I fervently believe there has to be a better way, but exec "from {0} import *".format("modname") didn't do it, several attempts to inject the pickle load into the targetModule reference didn't do it, and eval("commonUnpickle.load()", targetModule.__dict__, locals()) didn't do it. All of these fail with Unpickle's AttributeError over being unable to find the function in <module>.

What is a better way?

952

asked Aug 18 '11 19:08

Adam Norberg

2 Answers

Pickling functions can be rather annoying if trying to move them into a different context. If the function does not reference anything from the module that it is in and references (if anything) modules that are guaranteed to be imported, you might check some code from a Rudimentary Database Engine found on the Python Cookbook.

In order to support views, the academic module grabs the code from the callable when pickling the query. When it comes time to unpickle the view, a LambdaType instance is created with the code object and a reference to a namespace containing all imported modules. The solution has limitations but worked well enough for the exercise.

Example for Views

class _View:

    def __init__(self, database, query, *name_changes):
        "Initializes _View instance with details of saved query."
        self.__database = database
        self.__query = query
        self.__name_changes = name_changes

    def __getstate__(self):
        "Returns everything needed to pickle _View instance."
        return self.__database, self.__query.__code__, self.__name_changes

    def __setstate__(self, state):
        "Sets the state of the _View instance when unpickled."
        database, query, name_changes = state
        self.__database = database
        self.__query = types.LambdaType(query, sys.modules)
        self.__name_changes = name_changes

Sometimes is appears necessary to make modifications to the registered modules available in the system. If for example you need to make reference to the first module (__main__), you may need to create a new module with your available namespace loaded into a new module object. The same recipe used the following technique.

Example for Modules

def test_northwind():
    "Loads and runs some test on the sample Northwind database."
    import os, imp
    # Patch the module namespace to recognize this file.
    name = os.path.splitext(os.path.basename(sys.argv[0]))[0]
    module = imp.new_module(name)
    vars(module).update(globals())
    sys.modules[name] = module

193

answered Oct 05 '22 07:10

Noctis Skytower

Your question was long, and I was too caffeinated to make it through your very long question… However, I think you are looking to do something that there's a pretty good existing solution for already. There's a fork of the parallel python (i.e. pp) library that takes functions and objects and serializes them, sends them to different servers, and then unpikles and executes them. The fork lives inside the pathos package, but you can download it independently here:

http://danse.cacr.caltech.edu/packages/dev_danse_us

The "other context" in that case is another server… and the objects are transported by converting the objects to source code and then back to objects.

If you are looking to use pickling, much in the way you are doing already, there's an extension to mpi4py that serializes arguments and functions, and returns pickled return values… The package is called pyina, and is commonly used to ship code and objects to cluster nodes in coordination with a cluster scheduler.

Both pathos and pyina provide map abstractions (and pipe), and try to hide all of the details of parallel computing behind the abstractions, so scientists don't need to learn anything except how to program normal serial python. They just use one of the map or pipe functions, and get parallel or distributed computing.

Oh, I almost forgot. The dill serializer includes dump_session and load_session functions that allow the user to easily serialize their entire interpreter session and send it to another computer (or just save it for later use). That's pretty handy for changing contexts, in a different sense.

Get dill, pathos, and pyina here: https://github.com/uqfoundation

answered Oct 05 '22 07:10

Mike McKerns

Related questions
                            
                                SQLAlchemy requires query to be aliased yet that alias is not used in the generated SQL
                            
                                Running py.test inside Dockerfile
                            
                                trying to create dynamic subdags from parent dag based on array of filenames
                            
                                Mixed effects logistic regression
                            
                                Django run tasks (possibly) in the far future
                            
                                Extract artwork from table game card image with OpenCV
                            
                                Why are migrations files often excluded from code formatting?
                            
                                Capturing and manipulating a webcam feed and exposing it as a "virtual webcam" - in Python, on Windows
                            
                                How to abort/cancel HTTP request in Python thread?
                            
                                Understanding FeatureHasher, collisions and vector size trade-off
                            
                                I got the warning "UserWarning: One or more of the test scores are non-finite" when revising a toy scikit-learn gridsearchCV example
                            
                                How to convert raw pointers to lightweight python datatype using pybind11?
                            
                                How to type mutable default arguments
                            
                                How can I see error logs of Django views
                            
                                What is a convenient way to store and retrieve boolean values in a CSV file
                            
                                Wondering whether I should just bail on using properties in python
                            
                                gdb with Qt pretty printers
                            
                                Macro or equivalent in reStructuredText?
                            
                                Video Streaming using Python
                            
                                python 3.2 plugin factory: instantiation from class/metaclass

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Unpickling a function into a different context in Python

Tags:

python

serialization

pickle

Adam Norberg

People also ask

2 Answers

Noctis Skytower

Mike McKerns

Recent Activity

Donate For Us