Why is multiprocessing slower here?

Tags:

multiprocessing

I am trying to speed up some code with multiprocessing in Python, but I cannot understand one point. Assume I have the following dumb function:

import time
from multiprocessing.pool import Pool

def foo(_):
    for _ in range(100000000):
        a = 3

When I run this code without using multiprocessing (see the code below) on my laptop (Intel - 8 cores cpu) time taken is ~2.31 seconds.

t1 = time.time()
foo(1)
print(f"Without multiprocessing {time.time() - t1}")

Instead, when I run this code by using Python multiprocessing library (see the code below) time taken is ~6.0 seconds.

pool = Pool(8)
t1 = time.time()
pool.map(foo, range(8))
print(f"Sample multiprocessing {time.time() - t1}")

From the best of my knowledge, I understand that when using multiprocessing there is some time overhead mainly caused by the need to spawn the new processes and to copy the memory state. However, this operation should be performed just once when the processed are initially spawned at the very beginning and should not be that huge.

So what I am missing here? Is there something wrong in my reasoning?

Edit: I think it is better to be more explicit on my question. What I expected here was the multiprocessed code to be slightly slower than the sequential one. It is true that I don't split the whole work across the 8 cores, but I am using 8 cores in parallel to do the same job (hence in an ideal world the processing time should more or less stay the same). Considering the overhead of spawning new processes, I expected a total increase in time of some (not too big) percentage, but not of a ~2.60x increase as I got here.

641

asked May 19 '20 21:05

aprospero

2 Answers

Well, multiprocessing can't possibly make this faster: you're not dividing the work across 8 processes, you're asking each of 8 processes to do the entire thing. Each process will take at least as long as your code doing it just once without using multiprocessing.

So if multiprocessing weren't helping at all, you'd expect it to take about 8 times as long (it's doing 8x the work!) as your single-processor run. But you said it's not taking 2.31 * 8 ~= 18.5 seconds, but "only" about 6. So you're getting better than a factor of 3 speedup.

Why not more than that? Can't guess from here. That will depend on how many physical cores your machine has, and how much other stuff you're running at the same time. Each process will be 100% CPU-bound for this specific function, so the number of "logical" cores is pretty much irrelevant - there's scant opportunity for processor hyper-threading to help. So I'm guessing you have 4 physical cores.

On my box

Sample timing on my box, which has 8 logical cores but only 4 physical cores, and otherwise left the box pretty quiet:

Without multiprocessing 2.468580484390259
Sample multiprocessing 4.78624415397644

As above, none of that surprises me. In fact, I was a little surprised (but pleasantly) at how effectively the program used up the machine's true capacity.

191

answered Sep 20 '22 17:09

Tim Peters

@TimPeters already answered that you are actually just running the job 8 times across the 8 Pool subprocesses, so it is slower not faster.

That answers the issue but does not really answer what your real underlying question was. It is clear from your surprise at this result, that you were expecting that the single job to somehow be automatically split up and run in parts across the 8 Pool processes. That is not the way that it works. You have to build in/tell it how to split up the work.

Different kinds of jobs needs need to be subdivided in different ways, but to continue with your example you might do something like this:

import time
from multiprocessing.pool import Pool

def foo(_):
    for _ in range(100000000):
        a = 3 

def foo2(job_desc):
    start, stop = job_desc
    print(f"{start}, {stop}")

    for _ in range(start, stop):    
        a = 3 

def main():
    t1 = time.time()
    foo(1)
    print(f"Without multiprocessing {time.time() - t1}")

    pool_size = 8
    pool = Pool(pool_size)

    t1 = time.time()

    top_num = 100000000
    size = top_num // pool_size
    job_desc_list = [[size * j, size * (j+1)] for j in range(pool_size)]
    # this is in case the the upper bound is not a multiple of pool_size
    job_desc_list[-1][-1] = top_num

    pool.map(foo2, job_desc_list)
    print(f"Sample multiprocessing {time.time() - t1}")


if __name__ == "__main__":
    main()

Which results in:

Without multiprocessing 3.080709171295166
0, 12500000
12500000, 25000000
25000000, 37500000
37500000, 50000000
50000000, 62500000
62500000, 75000000
75000000, 87500000
87500000, 100000000
Sample multiprocessing 1.5312283039093018

As this shows, splitting the job up does allow it to take less time. The speedup will depend on the number of CPUs. In a CPU bound job you should try to limit it the pool size to the number of CPUs. My laptop has plenty more CPU's but some of the benefit is lost to the overhead. If the jobs were longer this should look more useful.

answered Sep 20 '22 17:09

Glenn Mackintosh

Related questions
                            
                                How to extract a cell value from a Dataframe in Pandas
                            
                                Sklearn's class "StratifiedShuffleSplit"
                            
                                Show extra columns when hovering in a scatter plot with hvplot
                            
                                Comparison of multi-threading models in Julia >=1.3 and Python 3.x
                            
                                no module named tensorflow.contrib
                            
                                How to convert a video (on disk) to a rtsp stream
                            
                                How exactly should the input file be formatted for the language model finetuning (BERT through Huggingface Transformers)?
                            
                                When and why to use self.__dict__ instead of self.variable
                            
                                AttributeError: 'function' object has no attribute 'func_name' and python 3
                            
                                Python - Pandas - Remove only splits that only numeric but maintain if it have alphabetic
                            
                                Python - Reuse functions in Dash callbacks
                            
                                Having a hard time getting Rabbitmq Server started and wonder why keep getting this Error init:do_boot/3 line 817
                            
                                Recursion without Referencing and Assignment
                            
                                reticulate doesn't print to console in real time
                            
                                How to add a default array of values ​to ArrayField?
                            
                                How can a Google Cloud Python function access a private Python package
                            
                                How to force zero (0) to the center of an axis in matplotlib
                            
                                What is the equivalent of decorators with arguments without the syntactical-sugar?
                            
                                What is the use of pd.plotting.register_matplotlib_converters() in Pandas
                            
                                How can I check a file is closed or not in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With