Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is multiprocessing slower here?

I am trying to speed up some code with multiprocessing in Python, but I cannot understand one point. Assume I have the following dumb function:

import time
from multiprocessing.pool import Pool

def foo(_):
    for _ in range(100000000):
        a = 3 

When I run this code without using multiprocessing (see the code below) on my laptop (Intel - 8 cores cpu) time taken is ~2.31 seconds.

t1 = time.time()
foo(1)
print(f"Without multiprocessing {time.time() - t1}")

Instead, when I run this code by using Python multiprocessing library (see the code below) time taken is ~6.0 seconds.

pool = Pool(8)
t1 = time.time()
pool.map(foo, range(8))
print(f"Sample multiprocessing {time.time() - t1}")

From the best of my knowledge, I understand that when using multiprocessing there is some time overhead mainly caused by the need to spawn the new processes and to copy the memory state. However, this operation should be performed just once when the processed are initially spawned at the very beginning and should not be that huge.

So what I am missing here? Is there something wrong in my reasoning?

Edit: I think it is better to be more explicit on my question. What I expected here was the multiprocessed code to be slightly slower than the sequential one. It is true that I don't split the whole work across the 8 cores, but I am using 8 cores in parallel to do the same job (hence in an ideal world the processing time should more or less stay the same). Considering the overhead of spawning new processes, I expected a total increase in time of some (not too big) percentage, but not of a ~2.60x increase as I got here.

like image 641
aprospero Avatar asked May 19 '20 21:05

aprospero


People also ask

Why multi process is slow?

The multiprocessing version is slower because it needs to reload the model in every map call because the mapped functions are assumed to be stateless. The multiprocessing version looks as follows. Note that in some cases, it is possible to achieve this using the initializer argument to multiprocessing.

Is multiprocessing queue slow?

In other words, Multiprocess queue is pretty slow putting and getting individual data, then QuickQueue wrap several data in one list, this list is one single data that is enqueue in the queue than is more quickly than put one individual data.

Does multiprocessing speed up?

Multiprocessing can accelerate execution time by utilizing more of your hardware or by creating a better concurrency pattern for the problem at hand.

What is the purpose of the process multiprocessing?

The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.


2 Answers

Well, multiprocessing can't possibly make this faster: you're not dividing the work across 8 processes, you're asking each of 8 processes to do the entire thing. Each process will take at least as long as your code doing it just once without using multiprocessing.

So if multiprocessing weren't helping at all, you'd expect it to take about 8 times as long (it's doing 8x the work!) as your single-processor run. But you said it's not taking 2.31 * 8 ~= 18.5 seconds, but "only" about 6. So you're getting better than a factor of 3 speedup.

Why not more than that? Can't guess from here. That will depend on how many physical cores your machine has, and how much other stuff you're running at the same time. Each process will be 100% CPU-bound for this specific function, so the number of "logical" cores is pretty much irrelevant - there's scant opportunity for processor hyper-threading to help. So I'm guessing you have 4 physical cores.

On my box

Sample timing on my box, which has 8 logical cores but only 4 physical cores, and otherwise left the box pretty quiet:

Without multiprocessing 2.468580484390259
Sample multiprocessing 4.78624415397644

As above, none of that surprises me. In fact, I was a little surprised (but pleasantly) at how effectively the program used up the machine's true capacity.

like image 191
Tim Peters Avatar answered Sep 20 '22 17:09

Tim Peters


@TimPeters already answered that you are actually just running the job 8 times across the 8 Pool subprocesses, so it is slower not faster.

That answers the issue but does not really answer what your real underlying question was. It is clear from your surprise at this result, that you were expecting that the single job to somehow be automatically split up and run in parts across the 8 Pool processes. That is not the way that it works. You have to build in/tell it how to split up the work.

Different kinds of jobs needs need to be subdivided in different ways, but to continue with your example you might do something like this:

import time
from multiprocessing.pool import Pool

def foo(_):
    for _ in range(100000000):
        a = 3 

def foo2(job_desc):
    start, stop = job_desc
    print(f"{start}, {stop}")

    for _ in range(start, stop):    
        a = 3 

def main():
    t1 = time.time()
    foo(1)
    print(f"Without multiprocessing {time.time() - t1}")

    pool_size = 8
    pool = Pool(pool_size)

    t1 = time.time()

    top_num = 100000000
    size = top_num // pool_size
    job_desc_list = [[size * j, size * (j+1)] for j in range(pool_size)]
    # this is in case the the upper bound is not a multiple of pool_size
    job_desc_list[-1][-1] = top_num

    pool.map(foo2, job_desc_list)
    print(f"Sample multiprocessing {time.time() - t1}")


if __name__ == "__main__":
    main()

Which results in:

Without multiprocessing 3.080709171295166
0, 12500000
12500000, 25000000
25000000, 37500000
37500000, 50000000
50000000, 62500000
62500000, 75000000
75000000, 87500000
87500000, 100000000
Sample multiprocessing 1.5312283039093018

As this shows, splitting the job up does allow it to take less time. The speedup will depend on the number of CPUs. In a CPU bound job you should try to limit it the pool size to the number of CPUs. My laptop has plenty more CPU's but some of the benefit is lost to the overhead. If the jobs were longer this should look more useful.

like image 21
Glenn Mackintosh Avatar answered Sep 20 '22 17:09

Glenn Mackintosh