I'm trying to run some sample code of the multiprocessing.pool module of python, found in the web. The code is:
def square(x):
return x * x
if __name__ == '__main__':
pool = Pool(processes=4)
inputs = [0, 1, 2, 3, 4]
outputs = pool.map(square, inputs)
But when i try to run it, it never finsh the execution and i have to restart the kernel of my IpythonNotebook notebook. What's the problem?
Calling the start() function on a terminated process will result in an AssertionError indicating that the process can only be started once. Instead, to restart a process in Python, you must create a new instance of the process with the same configuration and then call the start() function.
The multiprocessing version is slower because it needs to reload the model in every map call because the mapped functions are assumed to be stateless. The multiprocessing version looks as follows. Note that in some cases, it is possible to achieve this using the initializer argument to multiprocessing.
It works like a map-reduce architecture. It maps the input to the different processors and collects the output from all the processors. After the execution of code, it returns the output in form of a list or array. It waits for all the tasks to finish and then returns the output.
As you may read from the answer pointed out by John in the comments, multiprocessing.Pool
, in general, should not be expected to work well within an interactive interpreter. To understand why it is the case, consider how Pool
does its job:
import <this file>
, and listen for messages from the master.When you try to perform this procedure from an interactive prompt, there is no reasonable "current Python file" to pass to the children for importing. Moreover, the functions you defined in your interactive prompt are not part of any module (they are dynamically defined), and hence cannot be imported by the children from that nonexistent module. So your easiest bet is to simply avoid using multiprocessing
within IPython. IPython parallel is so much better anyway :)
For completeness' sake I also checked what exactly happens in my particular case of an IPython 4 running under Python 2.7 on Windows 8 (where I can observe the interpreter getting stuck as well). Interestingly, the reason IPython gets stuck in the first place is not one of those mentioned above.
It turns out that multiprocessing checks whether __main__.__file__
is defined, and if not, sends sys.argv[0]
as the "current filename" to the children. In the case of (my version of) IPython sys.argv[0]
is equal to C:\Dev\Anaconda\lib\site-packages\ipykernel\__main__.py
.
Unfortunately, the worker processes before starting up happen to check whether the file they are going to import is already in their sys.modules
. Line 488 of multiprocessing/forking.py
says:
assert main_name not in sys.modules, main_name
When the main_name
is __main__
(as is the case with ipython's workers) this assertion fails and the workers fail to start. The same code, however, is "smart" enough to check whether the passed name is ipython
, in which case it does no such checks nor does not import anything.
Consequently, the problem of workers failing to start could be solved using an ugly hack of defining __main__.__file__
to be equal to ipython
. The following code does work fine from an IPython cell:
import sys
sys.modules['__main__'].__file__ = 'ipython'
from multiprocessing import Pool
pool = Pool(processes=4)
inputs = [0, 1, 2, 3, 4]
outputs = pool.map(abs, inputs)
Note that this example asks the workers to compute abs
, a built-in function. It would fail (gracefully, with an exception) if you asked the workers to compute a function you defined within the notebook.
It turns out you can, in principle, go further with the hacking and have your functions sent over to the workers using some manual pickling of their code. You can find a pretty cool example of such a hack here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With