Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

requests.get hangs when called in a multiprocessing.Pool

I have the following code:

def process_url(url):
    print '111'
    r = requests.get(url)
    print '222' # <-- never even gets here
    return


urls_to_download = [list_or_urls]
PARALLEL_WORKERS = 4

pool = Pool(PARALLEL_WORKERS)
pool.map_async(process_url, urls_to_download)
pool.close()
pool.join()

Every time I do this, it runs the first four items and then just hangs. I don't think it's a timeout issue, as it is extremely fast to download the four urls. It is just after fetching those first four it hangs indefinitely.

What do I need to do to remedy this?

like image 961
David542 Avatar asked Sep 20 '14 00:09

David542


1 Answers

The problem

Even though this question uses python 2, you can still reproduce this "error" in python 3. This is happening because pool.async_map returns an object of class AsyncResult. To receive the result (or traceback in case of error) of the async_map call, you need to use get(). Joining the pool will not work here since the job has already been completed, with the result being an AsyncResult which acts similar to a Promise.

So, what's the solution?

Simply, add a call to wait for the result to be received:

from multiprocessing import Pool
import requests

def process_url(url):
    print('111')
    r = requests.get(url)
    print('222') # <-- never even gets here (not anymore!)
    return


if __name__ == "__main__":
    urls_to_download = ['https://google.com'] * 4
    PARALLEL_WORKERS = 4

    pool = Pool(PARALLEL_WORKERS)
    a = pool.map_async(process_url, urls_to_download)
    
    # Add call here
    a.get()

    pool.close()
    pool.join()

Output

111
111
111
111
222
222
222
222
like image 197
Charchit Agarwal Avatar answered Mar 12 '24 06:03

Charchit Agarwal