Using more worker processes than there are cores

Question

This example from PYMOTW gives an example of using multiprocessing.Pool() where the processes argument (number of worker processes) passed is twice the number of cores on the machine.

pool_size = multiprocessing.cpu_count() * 2

(The class will otherwise default to just cpu_count().)

Is there any validity to this? What is the effect of creating more workers than there are cores? Is there ever a case to be made for doing this, or will it perhaps impose additional overhead in the wrong direction? I am curious as to why it would be included consistently in examples from what I consider to be a reputable site.

In an initial test, it actually seems to slow things down a bit:

$ python -m timeit -n 25 -r 3 'import double_cpus; double_cpus.main()'
25 loops, best of 3: 266 msec per loop
$ python -m timeit -n 25 -r 3 'import default_cpus; default_cpus.main()'
25 loops, best of 3: 226 msec per loop

double_cpus.py:

import multiprocessing

def do_calculation(n):
    for i in range(n):
        i ** 2

def main():
    with multiprocessing.Pool(
        processes=multiprocessing.cpu_count() * 2,
        maxtasksperchild=2,
    ) as pool:
        pool.map(do_calculation, range(1000))

default_cpus.py:

def main():
    # `processes` will default to cpu_count()
    with multiprocessing.Pool(
        maxtasksperchild=2,
    ) as pool:
        pool.map(do_calculation, range(1000))

Darkonaut · Accepted Answer

Doing this can make sense if your job is not purely cpu-bound, but also involves some I/O.

The computation in your example is also too short for a reasonable benchmark, the overhead of just creating more processes in the first place dominates.

I modified your calculation to let it iterate over a range of 10M, while calculating an if-condition and let it take a nap in case it evaluates to True, which happens n_sleep-times. That way a total sleep of sleep_sec_total can be injected into the computation.

# default_cpus.py
import time
import multiprocessing


def do_calculation(iterations, n_sleep, sleep_sec):
    for i in range(iterations):
        if i % (iterations / n_sleep) == 0:
            time.sleep(sleep_sec)


def main(sleep_sec_total):

    iterations = int(10e6)
    n_sleep = 100
    sleep_sec = sleep_sec_total / n_sleep
    tasks = [(iterations, n_sleep, sleep_sec)] * 20

    with multiprocessing.Pool(
        maxtasksperchild=2,
    ) as pool:
        pool.starmap(do_calculation, tasks)

# double_cpus.py
...

def main(sleep_sec_total):

    iterations = int(10e6)
    n_sleep = 100
    sleep_sec = sleep_sec_total / n_sleep
    tasks = [(iterations, n_sleep, sleep_sec)] * 20

    with multiprocessing.Pool(
        processes=multiprocessing.cpu_count() * 2,
        maxtasksperchild=2,
    ) as pool:
        pool.starmap(do_calculation, tasks)

I ran the benchmark with sleep_sec_total=0 (purely cpu-bound) and with sleep_sec_total=2 for both modules.

Results with sleep_sec_total=0:

$ python -m timeit -n 5 -r 3 'import default_cpus; default_cpus.main(0)'
5 loops, best of 3: 15.2 sec per loop

$ python -m timeit -n 5 -r 3 'import double_cpus; double_cpus.main(0)'
5 loops, best of 3: 15.2 sec per loop

Given a reasonable computation-size, you'll observe close to no difference between default- and double-cpus for a purely cpu-bound task. Here it happened, that both tests had the same best-time.

Results with sleep_sec_total=2:

$ python -m timeit -n 5 -r 3 'import default_cpus; default_cpus.main(2)'
5 loops, best of 3: 20.5 sec per loop
$ python -m timeit -n 5 -r 3 'import double_cpus; double_cpus.main(2)'
5 loops, best of 3: 17.7 sec per loop

Now with adding 2 seconds of sleep as a dummy for I/0, the picture looks different. Using double as much processes gave a speed up of about 3 seconds compared to the default.

Uku Loskit · Answer

If you task is I/O bound (such as waiting for a database, a network service), then making more threads than there are processors actually increases your throughput.

This is because while your thread is waiting on I/O the processor can actually do work on other threads.

If you have a CPU heavy task, then more processors will actually slow it down.

Using more worker processes than there are cores

Tags:

python

optimization

parallel-processing

multiprocessing

python-multiprocessing

Brad Solomon

2 Answers

Darkonaut

Uku Loskit

Recent Activity

Donate For Us

Using more worker processes than there are cores

Tags:

python

optimization

parallel-processing

multiprocessing

python-multiprocessing

Brad Solomon

2 Answers

Darkonaut

Uku Loskit

Related questions

Recent Activity

Donate For Us