Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Possible to avoid ActiveRecord::ConnectionTimeoutError on Heroku?

On Heroku I have a rails app running that with both a couple web dynos as well as one worker dyno. I'm running thousands of worker tasks throughout the day on Sidekiq however occasionally the ActiveRecord::ConnectionTimeoutError is raised (approximately 50 times a day). I've set up my unicorn server as follows

worker_processes 4
timeout 30
preload_app true

before_fork do |server, worker|
    # As suggested here: https://devcenter.heroku.com/articles/rails-unicorn
    Signal.trap 'TERM' do
        puts 'Unicorn master intercepting TERM and sending myself QUIT instead'
        Process.kill 'QUIT', Process.pid
    end

    if defined?(ActiveRecord::Base)
        ActiveRecord::Base.connection.disconnect!
    end
end

after_fork do |server,worker|
    if defined?(ActiveRecord::Base)
        config = Rails.application.config.database_configuration[Rails.env]
        config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 10 # seconds
        config['pool']            = ENV['DB_POOL'] || 10
        ActiveRecord::Base.establish_connection(config)
    end

    Sidekiq.configure_client do |config|
        config.redis = { :size => 1 }
    end

    Sidekiq.configure_server do |config|
        config = Rails.application.config.database_configuration[Rails.env]
        config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 10 # seconds
        config['pool']            = ENV['DB_POOL'] || 10
        ActiveRecord::Base.establish_connection(config)
    end
end

On heroku I've set the DB_POOL config variable to 2 as recommended by Heroku. Should these errors be happening at all? Seems odd that it would be impossible to avoid such errors, no? What would you suggest?

like image 680
Nick ONeill Avatar asked Sep 19 '13 15:09

Nick ONeill


1 Answers

A sidekiq server (the process running on your server that is actually performing the delayed tasks) will by default dial up to 25 threads to process work off its queue. Each of these threads could be requesting a connection to your primary database through ActiveRecord if your tasks require it.

If you only have a connection pool of 5 connections, but you have 25 threads trying to connect, after 5 seconds the threads will just give up if they can't get an available connection from the pool and you'll get a connection time out error.

Setting the pool size for your Sidekiq server to something closer to your concurrency level (set with the -c flag when you start the process) will help alleviate this issue at the cost of opening many more connections to your database. If you are on Heroku and are using Postgres for example, some of their plans are limited to 20, whereas others have a connection limit of 500 (source).

If you are running a multi-process server environment like Unicorn, you also need to monitor the number of connections each forked process makes as well. If you have 4 unicorn processes, and a default connection pool size of 5, your unicorn environment at any given time could have 20 live connections. You can read more about that on Heroku's docs. Note also that the DB pool size doesn’t mean that each dyno will now have that many open connections, but only that if a new connection is needed it will be created until a maximum of that many have been created.

With that said, here is what I do.

# config/initializers/unicorn.rb

if ENV['RACK_ENV'] == 'development'
  worker_processes 1
  listen "#{ENV['BOXEN_SOCKET_DIR']}/rails_app"
  timeout 120
else
  worker_processes Integer(ENV["WEB_CONCURRENCY"] || 2)
  timeout 29
end

# The timeout mechanism in Unicorn is an extreme solution that should be avoided whenever possible. 
# It will help catch bugs in your application where and when your application forgets to use timeouts,
# but it is expensive as it kills and respawns a worker process.
# see http://unicorn.bogomips.org/Application_Timeouts.html

# Heroku recommends a timeout of 15 seconds. With a 15 second timeout, the master process will send a 
# SIGKILL to the worker process if processing a request takes longer than 15 seconds. This will 
# generate a H13 error code and you’ll see it in your logs. Note, this will not generate any stacktraces 
# to assist in debugging. Using Rack::Timeout, we can get a stacktrace in the logs that can be used for
# future debugging, so we set that value to something less than this one

preload_app true # for new relic

before_fork do |server, worker|
  Signal.trap 'TERM' do
    puts 'Unicorn master intercepting TERM and sending myself QUIT instead'
    Process.kill 'QUIT', Process.pid
  end

  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.connection.disconnect!
  end

end

after_fork do |server, worker|
  Signal.trap 'TERM' do
    puts 'Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT'
  end

  Rails.logger.info("Done forking unicorn processes")

  #https://devcenter.heroku.com/articles/concurrency-and-database-connections
  if defined?(ActiveRecord::Base)

    db_pool_size = if ENV["DB_POOL"]
      ENV["DB_POOL"]
    else
      ENV["WEB_CONCURRENCY"] || 2
    end

    config = Rails.application.config.database_configuration[Rails.env]
    config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 10 # seconds
    config['pool']              = ENV['DB_POOL'] || 2
    ActiveRecord::Base.establish_connection(config)

    # Turning synchronous_commit off can be a useful alternative when performance is more important than exact certainty about the durability of a transaction
    ActiveRecord::Base.connection.execute "update pg_settings set setting='off' where name = 'synchronous_commit';"    

    Rails.logger.info("Connection pool size for unicorn is now: #{ActiveRecord::Base.connection.pool.instance_variable_get('@size')}")
  end

end

And for sidekiq:

# config/initializers/sidekiq.rb

Sidekiq.configure_server do |config|

  sidekiq_pool = ENV['SIDEKIQ_DB_POOL'] || 20

  if defined?(ActiveRecord::Base)
    Rails.logger.debug("Setting custom connection pool size of #{sidekiq_pool} for Sidekiq Server")
    db_config = Rails.application.config.database_configuration[Rails.env]
    db_config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 10 # seconds
    cb_config['pool']              = sidekiq_pool
    ActiveRecord::Base.establish_connection(db_config)

    Rails.logger.info("Connection pool size for Sidekiq Server is now: #{ActiveRecord::Base.connection.pool.instance_variable_get('@size')}")
  end
end

If all goes well, when you fire up your processes you'll see something like in your log:

Setting custom connection pool size of 10 for Sidekiq Server
Connection pool size for Sidekiq Server is now: 20
Done forking unicorn processes
   (1.4ms)  update pg_settings set setting='off' where name = 'synchronous_commit';
Connection pool size for unicorn is now: 2

Sources:

  • https://devcenter.heroku.com/articles/concurrency-and-database-connections#connection-pool
  • https://github.com/mperham/sidekiq/issues/503
  • https://github.com/mperham/sidekiq/wiki/Advanced-Options
like image 116
stereoscott Avatar answered Oct 09 '22 07:10

stereoscott