This example from PYMOTW gives an example of using multiprocessing.Pool()
where the processes
argument (number of worker processes) passed is twice the number of cores on the machine.
pool_size = multiprocessing.cpu_count() * 2
(The class will otherwise default to just cpu_count()
.)
Is there any validity to this? What is the effect of creating more workers than there are cores? Is there ever a case to be made for doing this, or will it perhaps impose additional overhead in the wrong direction? I am curious as to why it would be included consistently in examples from what I consider to be a reputable site.
In an initial test, it actually seems to slow things down a bit:
$ python -m timeit -n 25 -r 3 'import double_cpus; double_cpus.main()'
25 loops, best of 3: 266 msec per loop
$ python -m timeit -n 25 -r 3 'import default_cpus; default_cpus.main()'
25 loops, best of 3: 226 msec per loop
double_cpus.py
:
import multiprocessing
def do_calculation(n):
for i in range(n):
i ** 2
def main():
with multiprocessing.Pool(
processes=multiprocessing.cpu_count() * 2,
maxtasksperchild=2,
) as pool:
pool.map(do_calculation, range(1000))
default_cpus.py
:
def main():
# `processes` will default to cpu_count()
with multiprocessing.Pool(
maxtasksperchild=2,
) as pool:
pool.map(do_calculation, range(1000))
Doing this can make sense if your job is not purely cpu-bound, but also involves some I/O.
The computation in your example is also too short for a reasonable benchmark, the overhead of just creating more processes in the first place dominates.
I modified your calculation to let it iterate over a range of 10M, while calculating an if-condition and let it take a nap in case it evaluates to True
, which happens n_sleep
-times.
That way a total sleep of sleep_sec_total
can be injected into the computation.
# default_cpus.py
import time
import multiprocessing
def do_calculation(iterations, n_sleep, sleep_sec):
for i in range(iterations):
if i % (iterations / n_sleep) == 0:
time.sleep(sleep_sec)
def main(sleep_sec_total):
iterations = int(10e6)
n_sleep = 100
sleep_sec = sleep_sec_total / n_sleep
tasks = [(iterations, n_sleep, sleep_sec)] * 20
with multiprocessing.Pool(
maxtasksperchild=2,
) as pool:
pool.starmap(do_calculation, tasks)
# double_cpus.py
...
def main(sleep_sec_total):
iterations = int(10e6)
n_sleep = 100
sleep_sec = sleep_sec_total / n_sleep
tasks = [(iterations, n_sleep, sleep_sec)] * 20
with multiprocessing.Pool(
processes=multiprocessing.cpu_count() * 2,
maxtasksperchild=2,
) as pool:
pool.starmap(do_calculation, tasks)
I ran the benchmark with sleep_sec_total=0
(purely cpu-bound) and with sleep_sec_total=2
for both modules.
Results with sleep_sec_total=0
:
$ python -m timeit -n 5 -r 3 'import default_cpus; default_cpus.main(0)'
5 loops, best of 3: 15.2 sec per loop
$ python -m timeit -n 5 -r 3 'import double_cpus; double_cpus.main(0)'
5 loops, best of 3: 15.2 sec per loop
Given a reasonable computation-size, you'll observe close to no difference between default- and double-cpus for a purely cpu-bound task. Here it happened, that both tests had the same best-time.
Results with sleep_sec_total=2
:
$ python -m timeit -n 5 -r 3 'import default_cpus; default_cpus.main(2)'
5 loops, best of 3: 20.5 sec per loop
$ python -m timeit -n 5 -r 3 'import double_cpus; double_cpus.main(2)'
5 loops, best of 3: 17.7 sec per loop
Now with adding 2 seconds of sleep as a dummy for I/0, the picture looks different. Using double as much processes gave a speed up of about 3 seconds compared to the default.
If you task is I/O bound (such as waiting for a database, a network service), then making more threads than there are processors actually increases your throughput.
This is because while your thread is waiting on I/O the processor can actually do work on other threads.
If you have a CPU heavy task, then more processors will actually slow it down.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With