I need to implement a multiprocessing pool that utilizes arbitrary packages for calculations. For this, I'm using Python and joblib 0.9.0. This code is basically the structure I want.
import numpy as np
from joblib import pool
def someComputation(x):
return np.interp(x, [-1, 1], [-1, 1])
if __name__ == '__main__':
some_set_of_numbers = [-1,-0.5,0,0.5,1]
the_pool = pool.Pool(processes=2)
solutions = [the_pool.apply_async(someComputation, (x,)) for x in some_set_of_numbers]
print(solutions[0].get())
On both Windows 10 and Red Hat Enterprise Linux running Anaconda 4.3.1 Python 3.6.0 (as well as 3.5 and 3.4 with virtual envs), I get that 'np' was never passed into the someComputation() function raising the error
File "C:\Anaconda3\lib\site-packages\multiprocessing_on_dill\pool.py", line 608, in get
raise self._value
NameError: name 'np' is not defined
however, on my Mac OS X 10.11.6 running Python 3.5 and the same joblib, I get the expected output of '-1' with the exact same code. This question is essentially the same, but it dealt with pathos and not joblib. The general answer was to include the numpy import statement inside of the function
from joblib import pool
def someComputation(x):
import numpy as np
return np.interp(x, [-1, 1], [-1, 1])
if __name__ == '__main__':
some_set_of_numbers = [-1,-0.5,0,0.5,1]
the_pool = pool.Pool(processes=2)
solutions = [the_pool.apply_async(someComputation, (x,)) for x in some_set_of_numbers]
print(solutions[0].get())
This solves the issue on the Windows and Linux machines, where they now output '-1' as expected but this solution seems clunky. Is there any reason why the first bit of code would work on a Mac, but not Windows or Linux? I ultimately need to run this code on the Linux machine so is there any fix that doesn't include putting the import statement inside of the function?
Edit:
After investigating a bit further, I found an old workaround I put in years ago that looks like is causing the issue. In joblib/pool.py, I changed line 44 from
from multiprocessing.pool import Pool
to
from multiprocessing_on_dill.pool import Pool
to support pickling of arbitrary functions. For some reason, this change is what really causes the issue on Windows and Linux, but the Mac machine runs just fine. Using multiprocessing instead of multiprocessing_on_dill solves the above issue, but the code doesn't work for the majority of my cases since they can't be pickled.
I am not sure what the exact issue is, but it appears that there is some problem with transferring the global scope over to the subprocesses that run the task. You can potentially avoid name errors by binding the name np
as a function parameter:
def someComputation(x, np=np):
return np.interp(x, [-1, 1], [-1, 1])
This has the advantage of not requiring a call to the import machinery every time the function is run. The name np
will be bound to the function when it is first evaluated during module loading.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With