Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to solve multiprocessing stop working problem in Python 3.7.2 using venv

2019-01-12 Update

I reinstalled Python 3.7.1 and remade venv to get everything back to work.

But still, I am unaware of what happens in 3.7.2.


I have been using multiprocessing.map_async and .apply_async in my data processing project. It worked fine in python 3.6 until 3.7.1 but when I urgrade to 3.7.2 and recreated venv, the main process just hang indefinitely and subprocesses not working at all.

I am using Windows10 and PyCharm Community.

I tried both the tool inside PyCharm and 'python -m venv' to create venv, but neither worked. I looked for documentation in python.org and found

https://docs.python.org/3.7/whatsnew/changelog.html#python-3-7-2-final

It says,

"venv on Windows will now use a python.exe redirector rather than copying the actual binaries from the base environment."

I wonder if this has caused the problem.

Example codes are as follows:

from multiprocessing import freeze_support, Pool

def test_func(x):
    y = x + 1
    return y

if __name__ == '__main__':
freeze_support()
test_data = list(range(10))
with Pool(4) as test_pool:
    for test_datum in test_data:
        apply_result = test_pool.apply_async(test_func, test_datum)
        print(apply_result.get())

I add a breakpoint in the last line and entered debug mode. Then I found that the apply_result object, which is a multiprocessing.pool.ApplyResult, has a _cache attribute. Under _cache there is the same multiprocessing.pool.ApplyResult but with the name of "0 (140716767896368)", which also has a _cache attribute, and on and on.

debug

I was desperate and tried possibly the simplest code (modified from official doc):

from multiprocessing import Pool, freeze_support

def f(x):
    return x*x

if __name__ == '__main__':
    freeze_support()
    p = Pool(5)
    print(p.map(f, [1, 2, 3]))

It hangs still.

If I choose the system interpreter, not using venv, it works fine.

[1, 4, 9]

I would sincerely appreciate any help in solving this problem.

like image 398
meizhu812 Avatar asked Jan 11 '19 17:01

meizhu812


1 Answers

I had the same problem, on Mac and with VS Code....

So here is my solution.

import joblib
from joblib import Parallel,delayed

def f(x):
    return x*x

number_of_cpu = joblib.cpu_count()
delayed_funcs = [delayed(f)(x) for x in [1,2,3]]
parallel_pool = Parallel(n_jobs=number_of_cpu,prefer="processes")
print(parallel_pool(delayed_funcs))

the doc is well documented anyway...

like image 149
Lumber Jack Avatar answered Oct 18 '22 10:10

Lumber Jack