Sidekiq concurrency and database connections pool

Tags:

Here is my problem: Each night, I have to process around 50k Background Jobs, each taking an average of 60s. Those jobs are basically calling the Facebook, Instagram and Twitter APIs to collect users' posts and save them in my DB. The jobs are processed by sidekiq.

At first, my setup was:

:concurrency: 5 in sidekiq.yml
pool: 5 in my database.yml
RAILS_MAX_THREADS set to 5 in my Web Server (puma) configuration.

My understanding is:

my web server (rails s) will use max 5 threads hence max 5 connections to my DB, which is OK as the connection pool is set to 5.
my sidekiq process will use 5 threads (as the concurrency is set to 5), which is also OK as the connection pool is set to 5.

In order to process more jobs in the same time and reducing the global time to process all my jobs, I decided to increase the sidekiq concurrency to 25. In Production, I provisionned a Heroku Postgres Standard Database with a maximum connection of 120, to be sure I will be able to use Sidekiq concurrency.

Thus, now the setup is:

:concurrency: 25 in sidekiq.yml
pool: 25 in my database.yml
RAILS_MAX_THREADS set to 5 in my Web Server (puma) configuration.

I can see that 25 sidekiq workers are working but each Job is taking way more time (sometimes more than 40 minutes instead of 1 minute) !?

Actually, I've been doing some tests and realize that processing 50 of my Jobs with a sidekiq concurrency of 5, 10 or 25 result in the same duration. As if somehow, there was a bottleneck of 5 connections somewhere.

I have checked Sidekiq Documentation and some other posts on SO (sidekiq - Is concurrency > 50 stable?, Scaling sidekiq network archetecture: concurrency vs processes) but I haven't been able to solve my problem.

So I am wondering:

is my understanding of the rails database.yml connection pool and sidekiq concurrency right ?
What's the correct way to setup those parameters ?

389

asked Aug 19 '17 12:08

Heimezi

1 Answers

Dropping this here in case someone else could use a quick, very general pointer:

Sometimes increasing the number of concurrent workers may not yield the expected results.

For instance, if there's a large discrepancy between the number of tasks and the number of cores, the scheduler will keep switching your tasks and there isn't really much to gain, the jobs will just take about the same or a bit more time.

Here's a link to a rather interesting read on how job scheduling works https://en.wikipedia.org/wiki/Scheduling_(computing)#Operating_system_process_scheduler_implementations

There are other aspects to consider as well, such as datastore access, are your workers using the same table(s)? Is it backed by a storage engine that locks the entire table, such as MyISAM? If that's the case, it won't matter if you have 100 workers running at the same time, and enough RAM and cores, they will all be waiting in line for whichever query is running to release the lock on the table they're all meant to be working with. This can also happen with tables using engines such as InnoDB, which doesn't lock the entire table on write but you may have different workers accessing the same rows (InnoDB uses row-level locking) or simply some large indexes that don't lock but slow down the table.

Another issue I've encountered was related to Rails (which I'm assuming you're using) taking quite a toll on RAM in some cases, so you might want to look at your memory footprint as well.

My suggestion is to turn on logging and look at the data, where do your workers spend most time at? Is it something on the network layer (unlikely), is it waiting to get access to a core? Reading/writing from your data store? Is your machine swapping?

answered Sep 18 '22 09:09

Nick M

Related questions
                            
                                Errno::EACCESS: Permission denied @ dir_s_mkdir
                            
                                Rails 3: guides.rubyonrails.org in PDF? [closed]
                            
                                Ruby/RoR: calling original method via super()?
                            
                                MySQL2 gem won't install
                            
                                heroku - rails - Permission denied (publickey)
                            
                                rails c not working in rails 5
                            
                                How do get a random DateTime rounded to beginning of hour in Rails?
                            
                                Renaming the created_at, updated_at columns of ActiveRecord/Rails
                            
                                sudo gem install pg won't work
                            
                                how to use strip_tags and truncate inside my Model.rb?
                            
                                web2py in the future? [closed]
                            
                                Adding a submit button image to a Rails form
                            
                                How to send emails with multiple, dynamic smtp using Actionmailer/Ruby on Rails
                            
                                Can't open rails console: production database not configured, establish_connection raises ActiveRecord::AdapterNotSpecified
                            
                                How to speed up assets compilation for tests?
                            
                                Testing REST-API responses with Rspec and Rack::Test
                            
                                Use rails_admin forms in custom views?
                            
                                How to simply stop or restart foreman processes
                            
                                Routes in Engine mounted on subdomain do not inherit the constraints

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Sidekiq concurrency and database connections pool

Tags:

postgresql

ruby-on-rails

heroku

activerecord

sidekiq

Heimezi

People also ask

1 Answers

Nick M

Recent Activity

Donate For Us