I find that defining functions which ask the user to define and then pass in another function to be a very natural design pattern for me. For example,
def gradient_descent(x0, grad_f):
x = x0
for _ in range(100):
x -= 0.1 * grad_f(x)
return x
Implements a generic gradient descent routine; all the user has to do is define the gradient function for f. This is basically the interface used by scipy.optimize, and the programs I write tend to use various function closures and dynamically defined functions in a similar way.
However, I have found myself facing some serious difficulties in taking advantage of parallelism with multiprocessing since functions can't be pickled. I know that there are ways around this, but it makes me question whether programming like this is even a "pythonic" way to do things.
Is this a natural design pattern in Python? Is there a better way to design programs that will likely need to be refactored to use multiple processes?
This is perfectly Pythonic, but you have to write a pickler for your closures.
Python doesn't do it for you automatically because there are a few different options you might want. In particular, you have to decide how far you want to "fake the closureness". Do you just want the captured values copied? Or do you want to copy the whole stack frame and capture cells out of that? Or do you want to actually insert a Manager
or the like in the way to force the captures to stay in sync with the parent?
Once you decide exactly what rules you want to apply, you can write code that does that. Read the pickle
docs for details, and also see the multiprocessing
docs and the linked source to see how it extends pickle
in other ways.
But the good news is what you want is most likely going to be either exactly what dill
does, or exactly what cloudpickle
does.
In general:
dill
tries to be as portable as possible, so you can save the pickles to disk and use them later, even if that means some things you probably don't care about are slightly different under the covers.cloudpickle
tries to be as exact as possible, even if that means the pickles don't work in anything but an exact clone of your process. If neither of them are exactly what you want, you can of course look at the source for both and work out how to do exactly what you do want.Here's a trivial closure:
def f():
def g(): return i
i=1
return g
g = f()
Compare:
>>> pickle.dumps(g)
AttributeError: Can't pickle local object 'f.<locals>.g'
>>> dill.loads(dill.dumps(g))
<function __main__.g>
>>> dill.loads(dill.dumps(g)).__closure__
(<cell at 0x108819618: int object at 0x1009e0980>,)
>>> dill.loads(dill.dumps(g))()
1
>>> cloudpickle.loads(cloudpickle.dumps(g))
<function __main__.f.<locals>.g>
>>> cloudpickle.loads(cloudpickle.dumps(g)).__closure__
(<cell at 0x108819618: int object at 0x1009e0980>,)
>>> cloudpickle.loads(cloudpickle.dumps(g))()
1
Notice that both of them end up generating a closure that captures one cell referencing an the value 1, but cloudpickle
got the name exactly right, while dill
didn't. If you try to pickle.dumps
the dill
version, you'll get an error about g
not being the same function as g
, while if you try to pickle.dumps
the cloudpickle
version you'll get exactly the same error about pickling local objects as you started with.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With