Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing speed

I wrote this bit of code to test out Python's multiprocessing on my computer:

from multiprocessing import Pool

var = range(5000000)
def test_func(i):
    return i+1

if __name__ == '__main__':
    p = Pool()
    var = p.map(test_func, var)

I timed this using Unix's time command and the results were:

real 0m2.914s
user 0m4.705s
sys  0m1.406s

Then, using the same var and test_func() I timed:

var = map(test_func, var)

and the results were

real 0m1.785s
user 0m1.548s
sys  0m0.214s

Shouldn't the multiprocessing code be much faster than plain old map?

like image 533
user1475412 Avatar asked Jun 27 '12 15:06

user1475412


People also ask

Is multiprocessing faster Python?

This pattern is extremely common, and I illustrate it here with a toy stream processing application. On a machine with 48 physical cores, Ray is 6x faster than Python multiprocessing and 17x faster than single-threaded Python. Python multiprocessing doesn't outperform single-threaded Python on fewer than 24 cores.

Does multiprocessing speed up?

Multiprocessing enables the computer to utilize multiple cores of a CPU to run tasks/processes in parallel. This parallelization leads to significant speedup in tasks that involve a lot of computation.

Is multiprocessing faster than multithreading?

2-Use Cases for Multiprocessing: Multiprocessing outshines threading in cases where the program is CPU intensive and doesn't have to do any IO or user interaction. Show activity on this post. Process may have multiple threads. These threads may share memory and are the units of execution within a process.

Is multithreading faster than multiprocessing Python?

Both multithreading and multiprocessing allow Python code to run concurrently. Only multiprocessing will allow your code to be truly parallel. However, if your code is IO-heavy (like HTTP requests), then multithreading will still probably speed up your code.


2 Answers

Why it should.

In map function, you are just calling the function sequentially.

Multiprocessing pool creates a set of workers to which your task will be mapped. It is coordinating multiple worker processes to run these functions.

Try doing some significant work inside your function and then time them and see if multiprocessing helps you to compute faster.

You have to understand that there will be overheads in using multiprocessing. Only when the computing effort is significantly greater than these overheads that you will see it's benefits.

See the last example in excellent introduction by Hellmann: http://www.doughellmann.com/PyMOTW/multiprocessing/communication.html

pool_size = multiprocessing.cpu_count() * 2
pool = multiprocessing.Pool(processes=pool_size,
                            initializer=start_process,
                            maxtasksperchild=2,
                            )
pool_outputs = pool.map(do_calculation, inputs)

You create pools depending on cores that you have.

like image 124
pyfunc Avatar answered Oct 26 '22 22:10

pyfunc


There is an overhead on using parallelization. There is only benefit if each work unit takes long enough to compensate the overhead.

Also if you only have one CPU (or CPU thread) on your machine, there's no point in using parallelization at all. You'll only see gains if you have at least a hyperthreaded machine or at least two CPU cores.

In your case a simple addition operation doesn't compensate that overhead.

Try something a bit more costly such as:

from multiprocessing import Pool
import math

def test_func(i):
    j = 0
    for x in xrange(1000000):
        j += math.atan2(i, i)
    return j

if __name__ == '__main__':
    var = range(500)
    p = Pool()
    var = p.map(test_func, var)
like image 21
unode Avatar answered Oct 26 '22 22:10

unode