Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Django ORM in threads and avoiding "too many clients" exception by using BoundedSemaphore

I work on manage.py command which creates about 200 threads to check remote hosts. My database setup allows me to use 120 connections, so I need to use some kind of pooling. I've tried using separated thread, like this

class Pool(Thread):
    def __init__(self):
        Thread.__init__(self)        
        self.semaphore = threading.BoundedSemaphore(10)

    def give(self, trackers):
        self.semaphore.acquire()
        data = ... some ORM (not lazy, query triggered here) ...
        self.semaphore.release()
        return data

I pass instance of this object to every check-thread but still getting "OperationalError: FATAL: sorry, too many clients already" inside Pool object after init-ing 120 threads . I've expected that only 10 database connections will be opened and threads will wait for free semaphore slot. I can check that semaphore works by commenting "release()", in that case only 10 threads will work and other will wait till app termination.

As much as I understand, every thread is opening new connection to database even if actual call is inside different thread, but why? Is there any way to perform all database queries inside only one thread?

like image 296
Riz Avatar asked Aug 08 '10 19:08

Riz


1 Answers

Django's ORM manages database connections in thread-local variables. So each different thread accessing the ORM will create its own connection. You can see that in the first few lines of django/db/backends/__init__.py.

If you want to limit the number of database connections made, you must limit the number of different threads that actually access the ORM. A solution could be to implement a service that delegates ORM requests to a pool of dedicated ORM threads. To transmit the requests and their results from and to other threads you will have to implement some sort of message passing mechanism. Since this is a typical producer/consumer problem, the Python docs about threading should give some hints how to achieve this.

Edit: I've just googled for "django connection pooling". There are many people who complain that Django does not provide a proper connection pool. Some of them managed to integrate a separate pooling package. For PostgreSQL, I would take a look at the pgpool middleware.

like image 152
Bernd Petersohn Avatar answered Nov 06 '22 21:11

Bernd Petersohn