Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python threading and performance?

I had to do heavy I/o bound operation, i.e Parsing large files and converting from one format to other format. Initially I used to do it serially, i.e parsing one after another..! Performance was very poor ( it used take 90+ seconds). So I decided to use threading to improve the performance. I created one thread for each file. ( 4 threads)

 for file in file_list:
            t=threading.Thread(target = self.convertfile,args = file)
            t.start()
            ts.append(t)
 for t in ts:
            t.join()

But for my astonishment, there is no performance improvement whatsoever. Now also it takes around 90+ seconds to complete the task. As this is I/o bound operation , I had expected to improve the performance.

like image 365
kumar Avatar asked Jun 10 '10 07:06

kumar


People also ask

Does threading improve performance in Python?

Using the threading module in Python or any other interpreted language with a GIL can actually result in reduced performance. If your code is performing a CPU bound task, such as decompressing gzip files, using the threading module will result in a slower execution time.

Does threading improve performance?

If you're unsatisfied with the processing output of your computer, it might be time to hyper-thread your central processing unit's (CPU) cores. Hyper-threading can be a great way to improve the processing speed of your PC without having to go through a major hardware makeover.

Why is Python threading slow?

This is due to the Python GIL being the bottleneck preventing threads from running completely concurrently. The best possible CPU utilisation can be achieved by making use of the ProcessPoolExecutor or Process modules which circumvents the GIL and make code run more concurrently.

Should you use threading in Python?

To recap, threading in Python allows multiple threads to be created within a single process, but due to GIL, none of them will ever run at the exact same time. Threading is still a very good option when it comes to running multiple I/O bound tasks concurrently.


2 Answers

Under the usual Python interpreter, threading will not allocate more CPU cores to your program because of the global interpreter lock (aka. the GIL).

The multiprocessing module could help you out here. (Note that it was introduced in Python 2.6, but backports exist for Python 2.5.)

As MSalters says, if your program is I/O bound it's debatable whether this is useful. But it might be worth a shot :)

To achieve what you want using this module:

import multiprocessing

MAX_PARALLEL_TASKS = 8 # I have an Intel Core i7 :)

pool = multiprocessing.Pool(MAX_PARALLEL_TASKS)

pool.map_async(convertfile, filelist)

pool.close()
pool.join()

Important! The function that you pass to map_async must be pickleable. In general, instance methods are NOT pickleable unless you engineering them to be so! Note that convertfile above is a function.

If you actually need to get results back from convertfile, there are ways to do that as well. The examples on the multiprocessing documentation page should clarify.

like image 79
detly Avatar answered Oct 31 '22 01:10

detly


Threading allows the OS to allocate more CPU cores to your program. If it's I/O bound, that means that the speed was limited by the I/O susbsystem speed instead of the CPU speed. In those cases, allocating more CPU cores doesn't necessarily help - you're still waiting on the I/O subsystem.

like image 31
MSalters Avatar answered Oct 31 '22 03:10

MSalters