Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python threading vs. multiprocessing in Linux

Based on this question I assumed that creating new process should be almost as fast as creating new thread in Linux. However, little test showed very different result. Here's my code:

from multiprocessing import Process, Pool
from threading import Thread

times = 1000

def inc(a):
    b = 1
    return a + b

def processes():
    for i in xrange(times):
        p = Process(target=inc, args=(i, ))
        p.start()
        p.join()

def threads():
    for i in xrange(times):
        t = Thread(target=inc, args=(i, ))
        t.start()
        t.join()

Tests:

>>> timeit processes() 
1 loops, best of 3: 3.8 s per loop

>>> timeit threads() 
10 loops, best of 3: 98.6 ms per loop

So, processes are almost 40 times slower to create! Why does it happen? Is it specific to Python or these libraries? Or did I just misinterpreted the answer above?


UPD 1. To make it more clear. I understand that this piece of code doesn't actually introduce any concurrency. The goal here is to test the time needed to create a process and a thread. To use real concurrency with Python one can use something like this:

def pools():
    pool = Pool(10)
    pool.map(inc, xrange(times))

which really runs much faster than threaded version.


UPD 2. I have added version with os.fork():

for i in xrange(times):
    child_pid = os.fork()
    if child_pid:
        os.waitpid(child_pid, 0)
    else:
        exit(-1)

Results are:

$ time python test_fork.py 

real    0m3.919s
user    0m0.040s
sys     0m0.208s

$ time python test_multiprocessing.py 

real    0m1.088s
user    0m0.128s
sys     0m0.292s

$ time python test_threadings.py

real    0m0.134s
user    0m0.112s
sys     0m0.048s
like image 863
ffriend Avatar asked Jul 02 '13 11:07

ffriend


1 Answers

The question you linked to is comparing the cost of just calling fork(2) vs. pthread_create(3), whereas your code does quite a bit more, e.g. using join() to wait for the processes/threads to terminate.

If, as you say...

The goal here is to test the time needed to create a process and a thread.

...then you shouldn't be waiting for them to complete. You should be using test programs more like these...

fork.py

import os
import time

def main():
    for i in range(100):
        pid = os.fork()
        if pid:
            #print 'created new process %d' % pid
            continue
        else:
            time.sleep(1)
            return

if __name__ == '__main__':
    main()

thread.py

import thread
import time

def dummy():
    time.sleep(1)

def main():
    for i in range(100):
        tid = thread.start_new_thread(dummy, ())
        #print 'created new thread %d' % tid

if __name__ == '__main__':
    main()

...which give the following results...

$ time python fork.py
real    0m0.035s
user    0m0.008s
sys     0m0.024s

$ time python thread.py
real    0m0.032s
user    0m0.012s
sys     0m0.024s

...so there's not much difference in the creation time of threads and processes.

like image 188
Aya Avatar answered Oct 25 '22 23:10

Aya