Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing pool function not defined

I need to implement a multiprocessing pool that utilizes arbitrary packages for calculations. For this, I'm using Python and joblib 0.9.0. This code is basically the structure I want.

import numpy as np
from joblib import pool

def someComputation(x):
    return np.interp(x, [-1, 1], [-1, 1])

if __name__ == '__main__':
    some_set_of_numbers = [-1,-0.5,0,0.5,1]
    the_pool = pool.Pool(processes=2)
    solutions = [the_pool.apply_async(someComputation, (x,)) for x in some_set_of_numbers]
    print(solutions[0].get())

On both Windows 10 and Red Hat Enterprise Linux running Anaconda 4.3.1 Python 3.6.0 (as well as 3.5 and 3.4 with virtual envs), I get that 'np' was never passed into the someComputation() function raising the error

File "C:\Anaconda3\lib\site-packages\multiprocessing_on_dill\pool.py", line 608, in get
    raise self._value
NameError: name 'np' is not defined

however, on my Mac OS X 10.11.6 running Python 3.5 and the same joblib, I get the expected output of '-1' with the exact same code. This question is essentially the same, but it dealt with pathos and not joblib. The general answer was to include the numpy import statement inside of the function

from joblib import pool

def someComputation(x):
    import numpy as np
    return np.interp(x, [-1, 1], [-1, 1])

if __name__ == '__main__':
    some_set_of_numbers = [-1,-0.5,0,0.5,1]
    the_pool = pool.Pool(processes=2)
    solutions = [the_pool.apply_async(someComputation, (x,)) for x in some_set_of_numbers]
    print(solutions[0].get())

This solves the issue on the Windows and Linux machines, where they now output '-1' as expected but this solution seems clunky. Is there any reason why the first bit of code would work on a Mac, but not Windows or Linux? I ultimately need to run this code on the Linux machine so is there any fix that doesn't include putting the import statement inside of the function?

Edit:

After investigating a bit further, I found an old workaround I put in years ago that looks like is causing the issue. In joblib/pool.py, I changed line 44 from

from multiprocessing.pool import Pool

to

from multiprocessing_on_dill.pool import Pool

to support pickling of arbitrary functions. For some reason, this change is what really causes the issue on Windows and Linux, but the Mac machine runs just fine. Using multiprocessing instead of multiprocessing_on_dill solves the above issue, but the code doesn't work for the majority of my cases since they can't be pickled.

like image 809
Michael Sparapany Avatar asked Oct 18 '22 12:10

Michael Sparapany


1 Answers

I am not sure what the exact issue is, but it appears that there is some problem with transferring the global scope over to the subprocesses that run the task. You can potentially avoid name errors by binding the name np as a function parameter:

def someComputation(x, np=np):
    return np.interp(x, [-1, 1], [-1, 1])

This has the advantage of not requiring a call to the import machinery every time the function is run. The name np will be bound to the function when it is first evaluated during module loading.

like image 191
Mad Physicist Avatar answered Nov 03 '22 07:11

Mad Physicist