Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

multi-threading in python: is it really performance effiicient most of the time?

In my little understanding, it is the performance factor that drives programming for multi-threading in most cases but not all. (irrespective of Java or Python).

I was reading this enlightening article on GIL in SO. The article summarizes that python adopts GIL mechanism; i.e only a single Thread can execute python byte code at any given time. This makes single thread application really faster.

My question is as follows:

Since if only one Thread is served at a given point, does multiprocessing or thread module provides a way to overcome this limitation imposed by GIL? If not, what features does they provide for doing a real multi-task work

There was a question asked in the comments section of the above post in the accepted answer,but no answer has been made? I had this question in my mind too

^so at any time point of time, only one thread will be serving content to client... 
so no point of actually using multithreading to improve performance. right?
like image 632
brain storm Avatar asked Jul 14 '14 19:07

brain storm


3 Answers

You're right about the GIL, there is no point to use multithreading to do CPU-bound computation, as the CPU will only be used by one thread.

But that previous statement may have enlighted you: If your computation is not CPU bound, you may take advantage of multithreading.

A typical example is when your application take most of its time waiting for something.

One of many many examples of not-CPU bound program: Say you want to build a web crawler, you have to crawl many many websites, and store them in a database, what does cost times ? Waiting for the servers to send data, actually downloading the data, and storing it in the database, nothing CPU bound here. Here you may get a faster crawler using a pool of crawlers instead of one single crawler. Typically in the case one website is almost down and very slow to respond (~30s), during this time, a single-threaded application will wait for the website, you're stuck. In a multithreaded application, other threads will continue crawling, and that's cool.

On the other hand, as there is one GIL per process, you may use multiprocessing to do CPU-bound computation.

As a side note, it exists some more or less partial implementations of Python without the GIL, I'd like to mention one that I think is in a great way to achieve something cool: pypy STM. You'll easily find, searching "get rid of the GIL" a lot of threads about the subject.

like image 186
Julien Palard Avatar answered Oct 15 '22 12:10

Julien Palard


Multiprocessing side-steps the GIL issue because code runs in a separate process while the GIL is only concerned with a single process. Within a process, multithreading may be faster to the extent that threads are waiting for some relatively slow resource like the disk or network.

like image 44
tdelaney Avatar answered Oct 15 '22 13:10

tdelaney


A quick google search yielded this informative slideshow. http://www.dabeaz.com/python/UnderstandingGIL.pdf

But what it fails to present it the fact that all threads are contained within a process. And a process by default can only run on one CPU (or core). So while the GIL on a per process basis does manage the threads in said process and doesn't always deliver the expected performance, it should at large scales perform better than single threaded operations.

like image 21
Andrew Avatar answered Oct 15 '22 13:10

Andrew