Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pickle a python function with its dependencies?

Tags:

As a follow up to this question: Is there an easy way to pickle a python function (or otherwise serialize its code)?

I would like to see an example of this bullet from the above post:

"If the function references globals (including imported modules, other functions etc) that you need to pick up, you'll need to serialise these too, or recreate them on the remote side. My example just gives it the remote process's global namespace."

I have a simple test going where I am writing a functions byte code to a file using marshal:

def g(self,blah):      print blah  def f(self):     for i in range(1,5):         print 'some function f'         g('some string used by g')  data = marshal.dumps(f.func_code)  file = open('/tmp/f2.txt', 'w') file.write(data) 

Then starting a fresh python instance I do:

file = open('/tmp/f2.txt', 'r') code = marshal.loads(file.read()) func2 = types.FunctionType(code, globals(), "some_func_name"); func2('blah') 

This results in a:

NameError: global name 'g' is not defined 

This is independent of the different approaches I have made to including g. I have tried basically the same approach to sending g over as f but f can still not see g. How do I get g into the global namespace so that it can be used by f in the receiving process?

Someone also recommended looking at pyro as an example of how to do this. I have already made an attempt at trying to understand the related code in the disco project. I took their dPickle class and tried to recreate their disco/tests/test_pickle.py functionality in a standalone app without success. My experiment had problems doing the function marshaling with the dumps call. Anyway, maybe a pyro exploration is next.

In summary, the basic functionality I am after is being able to send a method over the wire and have all the basic "workspace" methods sent over with it (like g).

Example with changes from answer:

Working function_writer:

import marshal, types  def g(blah):      print blah   def f():     for i in range(1,5):         print 'some function f'         g('blah string used by g')   f_data = marshal.dumps(f.func_code) g_data = marshal.dumps(g.func_code);  f_file = open('/tmp/f.txt', 'w') f_file.write(f_data)  g_file = open('/tmp/g.txt', 'w') g_file.write(g_data) 

Working function_reader:

import marshal, types  f_file = open('/tmp/f.txt', 'r') g_file = open('/tmp/g.txt', 'r')  f_code = marshal.loads(f_file.read()) g_code = marshal.loads(g_file.read())  f = types.FunctionType(f_code, globals(), 'f'); g = types.FunctionType(g_code, globals(), 'g');  f() 
like image 426
Ryan R. Avatar asked Apr 06 '12 19:04

Ryan R.


People also ask

Can you pickle Python functions?

In Python, you can use pickle to serialize (deserialize) an object structure into (from) a byte stream. Here are best practices for secure Python pickling. Pickle in Python is primarily used in serializing and deserializing a Python object structure.

What are the two methods used in pickling in Python?

“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.

What can and Cannot be pickled Python?

Generally you can pickle any object if you can pickle every attribute of that object. Classes, functions, and methods cannot be pickled -- if you pickle an object, the object's class is not pickled, just a string that identifies what class it belongs to.


1 Answers

Updated Sep 2020: See the comment by @ogrisel below. The developers of PiCloud moved to Dropbox shortly after I wrote the original version of this answer in 2013, though a lot of folks are still using the cloudpickle module seven years later. The module made its way to Apache Spark, where it has continued to be maintained and improved. I'm updating the example and background text below accordingly.

Cloudpickle

The cloudpickle package is able to pickle a function, method, class, or even a lambda, as well as any dependencies. To try it out, just pip install cloudpickle and then:

import cloudpickle  def foo(x):     return x*3  def bar(z):     return foo(z)+1  x = cloudpickle.dumps(bar) del foo del bar  import pickle  f = pickle.loads(x) print(f(3))  # displays "10"  

In other words, just call cloudpickle.dump() or cloudpickle.dumps() the same way you'd use pickle.*, then later use the native pickle.load() or pickle.loads() to thaw.

Background

PiCcloud.com released the cloud python package under the LGPL, and other open-source projects quickly started using it (google for cloudpickle.py to see a few). The folks at picloud.com had an incentive to put the effort into making general-purpose code pickling work -- their whole business was built around it. The idea was that if you had cpu_intensive_function() and wanted to run it on Amazon's EC2 grid, you just replaced:

cpu_intensive_function(some, args)  

with:

cloud.call(cpu_intensive_function, some, args) 

The latter used cloudpickle to pickle up any dependent code and data, shipped it to EC2, ran it, and returned the results to you when you called cloud.result().

Picloud billed in millisecond increments, it was cheap as heck, and I used it all the time for Monte Carlo simulations and financial time series analysis, when I needed hundreds of CPU cores for just a few seconds each. Years later, I still can't say enough good things about it and I didn't even work there.

like image 89
stevegt Avatar answered Sep 26 '22 07:09

stevegt