Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What happens when I multiprocessing.pool.apply_async more times than I have processors

I have the following setup:

results = [f(args) for _ in range(10**3)]

But, f(args) takes a long time to compute. So I'd like to throw multiprocessing at it. I would like to do so by doing:

pool = mp.pool(mp.cpu_count() -1) # mp.cpu_count() -> 8
results = [pool.apply_async(f, args) for _ in range(10**3)]

Clearly, I don't have 1000 processors on my computer, so my concern:
Does the above call result in 1000 processes simultaneously competing for CPU time or 7 processes running simultaneously, iteratively computing the next f(args) when the previous call finishes?

I suppose I could do something like pool.async_map(f, (args for _ in range(10**3))) to get the same results, but the purpose of this post is to understand the behavior of pool.apply_async

like image 244
inspectorG4dget Avatar asked May 05 '14 20:05

inspectorG4dget


Video Answer


2 Answers

You'll never have more processes running than there are workers in your pool (in your case mp.cpu_count() - 1. If you call apply_async and all the workers are busy, the task will be queued and executed as soon as a worker frees up. You can see this with a simple test program:

#!/usr/bin/python

import time
import multiprocessing as mp

def worker(chunk):
    print('working')
    time.sleep(10)
    return

def main():
    pool = mp.Pool(2)  # Only two workers
    for n in range(0, 8):
        pool.apply_async(worker, (n,))
        print("called it")
    pool.close()
    pool.join()

if __name__ == '__main__':
    main()

The output is like this:

called it
called it
called it
called it
called it
called it
called it
called it
working
working
<delay>
working
working
<delay>
working 
working
<delay>
working
working
like image 71
dano Avatar answered Oct 23 '22 09:10

dano


The number of worker processes is wholly controlled by the argument to mp.pool(). So if mp.cpu_count() returns 8 on your box, 7 worker processes will be created.

All pool methods (apply_async() among them) then use no more than that many worker processes. Under the covers, arguments are pickled in the main program and sent over an inter-process pipe to worker processes. This hidden machinery effectively creates a work queue, off of which the fixed number of worker processes pull descriptions of work to do (function name + arguments).

Other than that, it's all just magic ;-)

like image 35
Tim Peters Avatar answered Oct 23 '22 10:10

Tim Peters