Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallelism in python isn't working right

I was developing an app on gae using python 2.7, an ajax call requests some data from an API, a single request could take ~200 ms, however when I open two browsers and make two requests at a very close time they take more than the double of that, I've tried putting everything in threads but it didn't work.. (this happens when the app is online, not just on the dev-server)

So I wrote this simple test to see if this is a problem in python in general (in case of a busy wait), here is the code and the result:

def work():
    t = datetime.now()
    print threading.currentThread(), t
    i = 0
    while i < 100000000:
        i+=1
    t2 = datetime.now()
    print threading.currentThread(), t2, t2-t

if __name__ == '__main__': 
    print "single threaded:"
    t1 = threading.Thread(target=work)
    t1.start()
    t1.join()

    print "multi threaded:"
    t1 = threading.Thread(target=work)
    t1.start()
    t2 = threading.Thread(target=work)
    t2.start()
    t1.join()
    t2.join()

The result on mac os x, core i7 (4 cores, 8 threads), python2.7:

single threaded:
<Thread(Thread-1, started 4315942912)> 2011-12-06 15:38:07.763146
<Thread(Thread-1, started 4315942912)> 2011-12-06 15:38:13.091614 0:00:05.328468

multi threaded:
<Thread(Thread-2, started 4315942912)> 2011-12-06 15:38:13.091952
<Thread(Thread-3, started 4323282944)> 2011-12-06 15:38:13.102250
<Thread(Thread-3, started 4323282944)> 2011-12-06 15:38:29.221050 0:00:16.118800
<Thread(Thread-2, started 4315942912)> 2011-12-06 15:38:29.237512 0:00:16.145560

This is pretty shocking!! if a single thread would take 5 seconds to do this.. I thought starting two threads at the same time will take the same time to finish both tasks, but it takes almost triple the time.. this makes the whole threading idea useless, as it would be faster to do them sequentially!

what am I missing here..

like image 222
Mohamed Khamis Avatar asked Dec 06 '11 17:12

Mohamed Khamis


People also ask

Is parallelism possible in Python?

Python provides mechanisms for both concurrency and parallelism, each with its own syntax and use cases.

How do you achieve parallelism in Python?

Threads are one of the ways to achieve parallelism with shared memory. These are the independent sub-tasks that originate from a process and share memory. Due to Global Interpreter Lock (GIL) , threads can't be used to increase performance in Python.

Why is Python not multithreaded?

Python doesn't support multi-threading because Python on the Cpython interpreter does not support true multi-core execution via multithreading. However, Python does have a threading library. The GIL does not prevent threading.

Do Python threads run in parallel?

In fact, a Python process cannot run threads in parallel but it can run them concurrently through context switching during I/O bound operations. This limitation is actually enforced by GIL. The Python Global Interpreter Lock (GIL) prevents threads within the same process to be executed at the same time.


2 Answers

David Beazley gave a talk about this issue at PyCon 2010. As others have already stated, for some tasks, using threading especially with multiple cores can lead to slower performance than the same task performed by a single thread. The problem, Beazley found, had to do with multiple cores having a "GIL battle":

enter image description here

To avoid GIL contention, you may get better results having the tasks run in separate processes instead of separate threads. The multiprocessing module provides a convenient way to do that especially since multiprocessing API is very similar to the threading API.

import multiprocessing as mp
import datetime as dt
def work():
    t = dt.datetime.now()
    print mp.current_process().name, t
    i = 0
    while i < 100000000:
        i+=1
    t2 = dt.datetime.now()
    print mp.current_process().name, t2, t2-t

if __name__ == '__main__': 
    print "single process:"
    t1 = mp.Process(target=work)
    t1.start()
    t1.join()

    print "multi process:"
    t1 = mp.Process(target=work)
    t1.start()
    t2 = mp.Process(target=work)
    t2.start()
    t1.join()
    t2.join()

yields

single process:
Process-1 2011-12-06 12:34:20.611526
Process-1 2011-12-06 12:34:28.494831 0:00:07.883305
multi process:
Process-3 2011-12-06 12:34:28.497895
Process-2 2011-12-06 12:34:28.503433
Process-2 2011-12-06 12:34:36.458354 0:00:07.954921
Process-3 2011-12-06 12:34:36.546656 0:00:08.048761

PS. As zeekay pointed out in the comments, The GIL battle is only severe for CPU-bound tasks. It should not be a problem for IO-bound tasks.

like image 134
unutbu Avatar answered Oct 21 '22 12:10

unutbu


the CPython interpreter will not allow more then one thread to run. read about GIL http://wiki.python.org/moin/GlobalInterpreterLock

So certain tasks cannot be done concurrently in an efficient way in the CPython with threads.

If you want to do things parallel in GAE, then start them parallel with separate requests.

Also, you may want to consult to the Python parallel wiki http://wiki.python.org/moin/ParallelProcessing

like image 33
bpgergo Avatar answered Oct 21 '22 12:10

bpgergo