Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I manage connection pooling to PostgreSQL from sidekiq?

The problem I have a rails application that runs a few hundred sidekiq background processes. They all connect to a PostgreSQL database which is not exactly happy about providing 250 connections - it can, but if all sidekiq processes accidentally send queries to the db, it crumbles.

Option 1 I have been thinking about adding pgBouncer in front of the db, however I cannot currently use it's transactional mode, since I'm highly dependent upon setting the search_path at the beginning of each job processing for determining which "country" (PostgreSQL schema) to work on (apartment-gem). In this case, I would have to use the session based connection pooling mode. This however would, as far as I know, require me to disconnect the connections after each job processing, to release the connections back into the pool, and that would be really costly performance wise wouldn't it? Am I missing out on something?

Option 2 use application layer based connection pooling is of cause also an option, however I'm not really sure how I would be able to do that for PostgreSQL with sidekiq?

Option 3 something I have not thought of?

like image 732
Niels Kristian Avatar asked Mar 19 '16 00:03

Niels Kristian


People also ask

How does connection pooling work in PostgreSQL?

Connection pooling is the process of having a pool of active connections on the backend servers. These can be used any time a user sends a request. Instead of opening, maintaining, and closing a connection when a user sends a request, the server will assign an active connection to the user.

What is connection timeout in PostgreSQL?

The default connection timeout limit to PostgreSQL data sources is 30 seconds.

What is connection limit in PostgreSQL?

PostgreSQL Connection Limits 15 connections are reserved for the superuser to maintain the state and integrity of your database, and 100 connections are available for you and your applications. If the number of connections to the database exceeds the 100-connection limit, new connections fail and return an error.


1 Answers

Option 1: You're correct, sessions would require you to drop and reconnect and that adds overhead. How costly would be dependent on access pattern ie what fraction of the connection/tcp handshake etc is of the total work done and what sort of latency you need. Definitely worth benchmarking but if the connections are short lived then the overhead will be really noticeable.

Option 2/3: You could rate limit or throttle your sidekiq jobs. There are a few projects here tackling this...

Queue limits

  • Sidekiq Limit Fetch: Restrict number of workers which are able to run specified queues simultaneously. You can pause queues and resize queue distribution dynamically. Also tracks number of active workers per queue. Supports global mode (multiple sidekiq processes). There is an additional blocking queue mode.
  • Sidekiq Throttler: Sidekiq::Throttler is a middleware for Sidekiq that adds the ability to rate limit job execution on a per-worker basis.
  • sidekiq-rate-limiter: Redis backed, per worker rate limits for job processing.
  • Sidekiq::Throttled: Concurrency and threshold throttling.

I got the above from here

https://github.com/mperham/sidekiq/wiki/Related-Projects

If your application must have a connection per process and you're unable to break it up where more threads can use a connection then it's pgBouncer or Application based connection pooling. Connection pooling is in effect either going to throttle or limit your app in some way in order to save the DB.

like image 107
Harry Avatar answered Oct 03 '22 04:10

Harry