Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it a natural design pattern to use closures and dynamically defined functions in Python?

I find that defining functions which ask the user to define and then pass in another function to be a very natural design pattern for me. For example,

def gradient_descent(x0, grad_f):
    x = x0
    for _ in range(100):
        x -= 0.1 * grad_f(x)
    return x

Implements a generic gradient descent routine; all the user has to do is define the gradient function for f. This is basically the interface used by scipy.optimize, and the programs I write tend to use various function closures and dynamically defined functions in a similar way.

However, I have found myself facing some serious difficulties in taking advantage of parallelism with multiprocessing since functions can't be pickled. I know that there are ways around this, but it makes me question whether programming like this is even a "pythonic" way to do things.

Is this a natural design pattern in Python? Is there a better way to design programs that will likely need to be refactored to use multiple processes?

like image 819
RJTK Avatar asked Oct 16 '22 19:10

RJTK


1 Answers

This is perfectly Pythonic, but you have to write a pickler for your closures.

Python doesn't do it for you automatically because there are a few different options you might want. In particular, you have to decide how far you want to "fake the closureness". Do you just want the captured values copied? Or do you want to copy the whole stack frame and capture cells out of that? Or do you want to actually insert a Manager or the like in the way to force the captures to stay in sync with the parent?

Once you decide exactly what rules you want to apply, you can write code that does that. Read the pickle docs for details, and also see the multiprocessing docs and the linked source to see how it extends pickle in other ways.


But the good news is what you want is most likely going to be either exactly what dill does, or exactly what cloudpickle does.

In general:

  • dill tries to be as portable as possible, so you can save the pickles to disk and use them later, even if that means some things you probably don't care about are slightly different under the covers.
  • cloudpickle tries to be as exact as possible, even if that means the pickles don't work in anything but an exact clone of your process. If neither of them are exactly what you want, you can of course look at the source for both and work out how to do exactly what you do want.

Here's a trivial closure:

def f():
    def g(): return i
    i=1
    return g
g = f()

Compare:

>>> pickle.dumps(g)
AttributeError: Can't pickle local object 'f.<locals>.g'
>>> dill.loads(dill.dumps(g))
<function __main__.g>
>>> dill.loads(dill.dumps(g)).__closure__
(<cell at 0x108819618: int object at 0x1009e0980>,)
>>> dill.loads(dill.dumps(g))()
1
>>> cloudpickle.loads(cloudpickle.dumps(g))
<function __main__.f.<locals>.g>
>>> cloudpickle.loads(cloudpickle.dumps(g)).__closure__
(<cell at 0x108819618: int object at 0x1009e0980>,)
>>> cloudpickle.loads(cloudpickle.dumps(g))()
1

Notice that both of them end up generating a closure that captures one cell referencing an the value 1, but cloudpickle got the name exactly right, while dill didn't. If you try to pickle.dumps the dill version, you'll get an error about g not being the same function as g, while if you try to pickle.dumps the cloudpickle version you'll get exactly the same error about pickling local objects as you started with.

like image 183
abarnert Avatar answered Oct 20 '22 16:10

abarnert