Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: threading + lock slows my app down considerably

Say I have a function that writes to a file. I also have a function that loops repeatedly reading from said file. I have both of these functions running in separate threads. (Actually I am reading/writing to registers via MDIO which is why I can't have both threads executing concurrently, only one or the other, but for the sake of simplicity, let's just say it's a file)

Now when I run the write function in isolation, it executes fairly quickly. However when I'm running threaded and have it acquire a lock before running, it seems to run extremely slow. Is this because the second thread (read function) is polling to acquire the lock? Is there any way to get around this?

I am currently just using a simple RLock, but am open to any change that would increase performance.

Edit: As an example, I will put a basic example of what's going on. The read thread is basically always running, but occasionally a separate thread will make a call to load. If I benchmark by running load from cmd prompt, running in a thread is at least 3x slower.

write thread:

import usbmpc # functions I made which access dll functions for hardware, etc

def load(self, lock):
    lock.acquire()
    f = open('file.txt','r')
    data = f.readlines()
    for x in data: 
        usbmpc.write(x)
    lock.release()

read thread:

import usbmpc

def read(self, lock): 
    addr = START_ADDR
    while True: 
        lock.acquire()
        data = usbmpc.read(addr)
        lock.release()
        addr += 4
        if addr > BUF_SIZE: addr = START_ADDR
like image 975
Shaunak Amin Avatar asked Mar 24 '11 22:03

Shaunak Amin


People also ask

Why is Python multithreading slow?

This is due to the Python GIL being the bottleneck preventing threads from running completely concurrently. The best possible CPU utilisation can be achieved by making use of the ProcessPoolExecutor or Process modules which circumvents the GIL and make code run more concurrently.

Does threading improve performance in Python?

Multithreading in Python streamlines the efficient utilization of resources as the threads share the same memory and data space. It also allows the concurrent appearance of multiple tasks and reduces the response time. This improves the performance.

What does locking a thread do Python?

Once a thread has acquired the lock, all subsequent attempts to acquire the lock are blocked until it is released. The lock can be released using the release() method. Calling the release() method on a lock, in an unlocked state, results in an error.

What are the limitations of threading in Python?

In fact, a Python process cannot run threads in parallel but it can run them concurrently through context switching during I/O bound operations. This limitation is actually enforced by GIL. The Python Global Interpreter Lock (GIL) prevents threads within the same process to be executed at the same time.


2 Answers

Do you use threading on a multicore machine?

If the answer is yes, then unless your Python version is 3.2+ you are going to suffer reduced performance when running threaded applications.

David Beazly has put considerable effort to find what is going on with GIL on multicores and has made it easy for the rest of us to understand it too. Check his website and the resources there. Also you might want to see his presentation at PyCon 2010. It is rather intresting.

To make a long story short, in Python 3.2, Antoine Pitrou wrote a new GIL that has the same performance on single and multicore machines. In previous versions, the more cores/threads you have, the performance loss increases...

hope it helps :)

like image 152
pmav99 Avatar answered Sep 30 '22 14:09

pmav99


Why aren't you acquiring the lock in the writer for the duration of each write only? You're currently locking for the entire duration of the load function, the reader never gets in until the load function is completely done.

Secondly, you should be using context locks. Your current code is not thread safe:

def load(lock):
    for x in data:
        with lock:
            whatever.write(x)

The same goes for your reader. Use a context to hold the lock.

Thirdly, don't use an RLock. You know you don't need one, at no point does your read/write code need to reacquire, so don't give it that opportunity, you will be masking bugs.

The real answer is in several of the comments to your question: The GIL is causing some contention (assuming it isn't actually your misuse of locking). The Python threading module is fantastic, the GIL sometimes is not, but moreso, the complex behaviours it generates that are misunderstood. It's worth mentioning though that the mainstream belief that throwing threads at problems is not the panacea people believe it to be. It usually isn't the solution.

like image 29
Matt Joiner Avatar answered Sep 30 '22 14:09

Matt Joiner