In the process of using joblib to parallelize some model-fitting code involving Theano functions, I've stumbled across some behavior that seems odd to me.
Consider this very simplified example:
from joblib import Parallel, delayed
import theano
from theano import tensor as te
import numpy as np
class TheanoModel(object):
def __init__(self):
X = te.dvector('X')
Y = (X ** te.log(X ** 2)).sum()
self.theano_get_Y = theano.function([X], Y)
def get_Y(self, x):
return self.theano_get_Y(x)
def run(niter=100):
x = np.random.randn(1000)
model = TheanoModel()
pool = Parallel(n_jobs=-1, verbose=1, pre_dispatch='all')
# this fails with `TypeError: can't pickle instancemethod objects`...
results = pool(delayed(model.get_Y)(x) for _ in xrange(niter))
# # ... but this works! Why?
# results = pool(delayed(model.theano_get_Y)(x) for _ in xrange(niter))
if __name__ == '__main__':
run()
I understand why the first case fails, since .get_Y()
is clearly an instancemethod of TheanoModel
. What I don't understand is why the second case works, since X
, Y
andtheano_get_Y()
are only declared within the __init__()
method of TheanoModel
. theano_get_Y()
can't be evaluated until the TheanoModel
instance has been created. Surely, then, it should also be considered an instancemethod, and should therefore be unpickleable? In fact, even still works if I explicitly declare X
and Y
to be attributes of the TheanoModel
instance.
Can anyone explain what's going on here?
Just to illustrate why I think this behaviour is particularly weird, here are a few examples of some other callable member objects that don't take self
as the first argument:
from joblib import Parallel, delayed
import theano
from theano import tensor as te
import numpy as np
class TheanoModel(object):
def __init__(self):
X = te.dvector('X')
Y = (X ** te.log(X ** 2)).sum()
self.theano_get_Y = theano.function([X], Y)
def square(x):
return x ** 2
self.member_function = square
self.static_method = staticmethod(square)
self.lambda_function = lambda x: x ** 2
def run(niter=100):
x = np.random.randn(1000)
model = TheanoModel()
pool = Parallel(n_jobs=-1, verbose=1, pre_dispatch='all')
# # not allowed: `TypeError: can't pickle function objects`
# results = pool(delayed(model.member_function)(x) for _ in xrange(niter))
# # not allowed: `TypeError: can't pickle function objects`
# results = pool(delayed(model.lambda_function)(x) for _ in xrange(niter))
# # also not allowed: `TypeError: can't pickle staticmethod objects`
# results = pool(delayed(model.static_method)(x) for _ in xrange(niter))
# but this is totally fine!?
results = pool(delayed(model.theano_get_Y)(x) for _ in xrange(niter))
if __name__ == '__main__':
run()
None of them are pickleable with the exception of the theano.function
!
Theano functions aren't python functions. Instead they are python objects that override __call__
. This means that you can call them just like a function but internally they are really objects of some custom class. In consequence, you can pickle them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With