Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Embarassingly parallel tasks with IPython Parallel (or other package) depending on unpickable objects

I often hit problems where I wanna do a simple stuff over a set of many, many objects quickly. My natural choice is to use IPython Parallel for its simplicity, but often I have to deal with unpickable objects. After trying for a few hours I usually resign myself to running my taks overnight on a single computer, or do a stupid thing like dividing things semi-manually in to run in multiple python scripts.

To give a concrete example, suppose I want to delete all keys in a give S3 bucket.

What I'd normally do without thinking is:

import boto
from IPython.parallel import Client

connection = boto.connect_s3(awskey, awssec)
bucket = connection.get_bucket('mybucket')

client = Client()
loadbalancer = c.load_balanced_view()

keyList = list(bucket.list())
loadbalancer.map(lambda key: key.delete(), keyList)

The problem is that the Key object in boto is unpickable (*). This occurs very often in different contexts for me. It's a problem also with multiprocessing, execnet, and all other frameworks and libs I tried (for obvious reasons: they all use the same pickler to serialize the objects).

Do you guys also have those problems? Is there a way I can serialize these more complex objects? Do I have to write my own pickler for this particular objects? If I do, how do I tell IPython Parallel to use it? How do I write a pickler?

Thanks!


(*) I'm aware that I can simply make a list of the keys names and do something like this:

loadbalancer.map(lambda keyname: getKey(keyname).delete())

and define the getKey function in each engine of the IPython cluster. This is just a particular instance of a more general problem that I find often. Maybe it's a bad example, since it can be easily solved in another way.

like image 286
Rafael S. Calsaverini Avatar asked May 22 '26 10:05

Rafael S. Calsaverini


1 Answers

IPython has a use_dill option, where if you have the dill serializer installed, you can serialize most "unpicklable" objects.

How can I use dill instead of pickle with load_balanced_view

like image 130
Mike McKerns Avatar answered May 23 '26 22:05

Mike McKerns



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!