How does pgBouncer help to speed up Django

Tags:

I have some management commands that are based on gevent. Since my management command makes thousands to requests, I can turn all socket calls into non-blocking calls using Gevent. This really speeds up my application as I can make requests simultaneously.

Currently the bottleneck in my application seems to be Postgres. It seems that this is because the Psycopg library that is used for connecting to Django is written in C and does not support asynchronous connections.

I've also read that using pgBouncer can speed up Postgres by 2X. This sounds great but it would be great if someone could explain how pgBouncer works and helps?

Thanks

248

asked May 02 '12 18:05

Mridang Agarwalla

1 Answers

Besides saving the overhead of connect & disconnect where this is otherwise done on each request, a connection pooler can funnel a large number of client connections down to a small number of actual database connections. In PostgreSQL, the optimal number of active database connections is usually somewhere around ((2 * core_count) + effective_spindle_count). Above this number, both throughput and latency get worse. NOTE: Recent versions have improved concurrency, so in 2022 I would recommend something more like ((4 * core_count) + effective_spindle_count).

Sometimes people will say "I want to support 2000 users, with fast response time." It is pretty much guaranteed that if you try to do that with 2000 actual database connections, performance will be horrible. If you have a machine with four quad-core processors and the active data set is fully cached, you will see much better performance for those 2000 users by funneling the requests through about 35 database connections.

To understand why that is true, this thought experiment should help. Consider a hypothetical database server machine with only one resource to share -- a single core. This core will time-slice equally among all concurrent requests with no overhead. Let's say 100 requests all come in at the same moment, each of which needs one second of CPU time. The core works on all of them, time-slicing among them until they all finish 100 seconds later. Now consider what happens if you put a connection pool in front which will accept 100 client connections but make only one request at a time to the database server, putting any requests which arrive while the connection is busy into a queue. Now when 100 requests arrive at the same time, one client gets a response in 1 second; another gets a response in 2 seconds, and the last client gets a response in 100 seconds. Nobody had to wait longer to get a response, throughput is the same, but the average latency is 50.5 seconds rather than 100 seconds.

A real database server has more resources which can be used in parallel, but the same principle holds, once they are saturated, you only hurt things by adding more concurrent database requests. It is actually worse than the example, because with more tasks you have more task switches, increased contention for locks and cache, L2 and L3 cache line contention, and many other issues which cut into both throughput and latency. On top of that, while a high work_mem setting can help a query in a number of ways, that setting is the limit per plan node for each connection, so with a large number of connections you need to leave this very small to avoid flushing cache or even leading to swapping, which leads to slower plans or such things as hash tables spilling to disk.

Some database products effectively build a connection pool into the server, but the PostgreSQL community has taken the position that since the best connection pooling is done closer to the client software, they will leave it to the users to manage this. Most poolers will have some way to limit the database connections to a hard number, while allowing more concurrent client requests than that, queuing them as necessary. This is what you want, and it should be done on a transactional basis, not per statement or connection.

190

answered Sep 19 '22 19:09

kgrittn

Related questions
                            
                                OpenCV videowrite doesn't write video
                            
                                InvalidBasesError: Cannot resolve bases for [<ModelState: 'users.GroupProxy'>]
                            
                                how to catch the MultipleObjectsReturned error in django
                            
                                Does a heaviside step function exist?
                            
                                Spyder missing Object Inspector
                            
                                how to change the case of first letter of a string?
                            
                                combine multiple text files into one text file using python [duplicate]
                            
                                How do I get word frequency in a corpus using Scikit Learn CountVectorizer?
                            
                                How can I Group By Month from a Date field using Python/Pandas
                            
                                How to make a checkerboard in numpy?
                            
                                How to parse list of models with Pydantic
                            
                                In Python How can I declare a Dynamic Array
                            
                                Homebrew , python installing
                            
                                Python Weighted Random [duplicate]
                            
                                Cheap exception handling in Python?
                            
                                Insert a newline character every 64 characters using Python
                            
                                Numercially stable softmax
                            
                                Convert an RFC 3339 time to a standard Python timestamp
                            
                                For loop - like Python range function
                            
                                How to generate a temporary url to upload file to Amazon S3 with boto library?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does pgBouncer help to speed up Django

Tags:

python

postgresql

django

connection-pooling

pgbouncer

Mridang Agarwalla

People also ask

1 Answers

kgrittn

Recent Activity

Donate For Us