Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiprocessing speed up vs number of cores

My laptop has 4 cores, and after some simple testing, I found my CPU usage is at 100% when I'm using multiprocessing with 4 jobs or more, about 75% with 3 jobs, about 50% with 2 jobs and 25% with 1 job. This makes perfect sense to me.

Then I found that my program runs my jobs 4 times faster with multiprocessing, but I feel it shouldn't always be 4 times faster.

For example, if I'm running 5 jobs, shouldn't my 5th job be queued and be processed only after any of those 4 jobs finished, since I have only 4 cores to use? In other words, if all the jobs are the same, and each takes T seconds so they take 5T seconds without multiprocessing, shouldn't they take 2T to process with multiprocessing given 4 cores to split up the work?

However, my testing result is about 5T/4 with multiprocessing. I'm really curious about why, below is my code for testing:

import multiprocessing
import time
def worker(num):

    print ("Worker"+str(num)+" start!")
    for i in range(30000000):
        abc = 123
    print ("Worker"+str(num)+" finished!")
    return



if __name__ == '__main__':
    jobs = []
    start = time.time()

    for i in range(5):
        p = multiprocessing.Process(target=worker, args=(i,))
        jobs.append(p)
        p.start()
        # p.join()


    for job in jobs:
        job.join()

    end = time.time()
    print (end - start)

EDIT: I come up with this follow up question after reading @nneonneo 's answer:

If my 5 jobs doesn't take same amount of time, but T, T, T, T and 2T seconds, and OS scheduler tries to ensure all processes get an equal share of time, then T seconds later, my first 4 jobs shall be done. Then only one of my cores can work on the last job so the total time will be T+T = 2T seconds, right? The total time won't be 6T/4 anymore.

like image 222
Arch1tect Avatar asked Jul 20 '15 04:07

Arch1tect


1 Answers

You created five processes. Therefore, all five run "at the same time". Your OS's scheduler will dutifully run all five processes according to its scheduling algorithm, which usually tries to ensure all processes get an equal share of time.

Thus, the five processes will all get about the same amount of CPU time, and so they'll finish at about the same time.

If you want to see the "expected" behaviour, create a multiprocessing.Pool with 4 workers and submit 5 jobs to it. The Pool will use only four processes which will process the incoming jobs serially.

like image 191
nneonneo Avatar answered Oct 16 '22 08:10

nneonneo