Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiprocessing AsyncResult.get() hangs in Python 3.7.2 but not in 3.6

I'm trying to port some code from Python 3.6 to Python 3.7 on Windows 10. I see the multiprocessing code hang when calling .get() on the AsyncResult object. The code in question is much more complicated, but I've boiled it down to something similar to the following program.

import multiprocessing


def main(num_jobs):
    num_processes = max(multiprocessing.cpu_count() - 1, 1)
    pool = multiprocessing.Pool(num_processes)

    func_args = []
    results = []

    try:
        for num in range(num_jobs):
            args = (1, 2, 3)
            func_args.append(args)
            results.append(pool.apply_async(print, args))

        for result, args in zip(results, func_args):
            print('waiting on', args)
            result.get()
    finally:
        pool.terminate()
        pool.join()


if __name__ == '__main__':
    main(5)

This code also runs in Python 2.7. For some reason the first call to get() hangs in 3.7, but everything works as expected on other versions.

like image 982
durden2.0 Avatar asked Feb 01 '19 13:02

durden2.0


People also ask

Why is multiprocessing so slow Python?

This is due to the Python GIL being the bottleneck preventing threads from running completely concurrently. The best possible CPU utilisation can be achieved by making use of the ProcessPoolExecutor or Process modules which circumvents the GIL and make code run more concurrently.

How many processes should be running Python multiprocessing?

If we are using the context manager to create the process pool so that it is automatically shutdown, then you can configure the number of processes in the same manner. The number of workers must be less than or equal to 61 if Windows is your operating system.

How do you pass multiple arguments in multiprocessing Python?

Use Pool. The multiprocessing pool starmap() function will call the target function with multiple arguments. As such it can be used instead of the map() function. This is probably the preferred approach for executing a target function in the multiprocessing pool that takes multiple arguments.


1 Answers

I think this is a regression in Python 3.7.2 as described here. It seems to only affect users when running in a virtualenv.

For the time being you can work-around it by doing what's described in this comment on the bug thread.

import _winapi
import multiprocessing.spawn
multiprocessing.spawn.set_executable(_winapi.GetModuleFileName(0))

That will force the subprocesses to spawn using the real python.exe instead of the one that's in the virtualenv. So, this may not be suitable if you're bundling things into an exe with PyInstaller, but it works OK when running from the CLI with local Python installation.

like image 136
durden2.0 Avatar answered Nov 15 '22 00:11

durden2.0