Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicorn exit timeout on Heroku after trapping TERM and sending QUIT

Tags:

heroku

unicorn

I am receiving R12 Exit Timeout errors for a Heroku app running unicorn and sidekiq. These errors occur 1-2 times a day and whenever I deploy. I understand that I need to convert the shutdown signals from Heroku for unicorn to respond correctly, but thought that I had done so in the below unicorn config:

worker_processes 3 timeout 30 preload_app true  before_fork do |server, worker|   Signal.trap 'TERM' do     puts "Unicorn master intercepting TERM and sending myself QUIT instead. My PID is #{Process.pid}"     Process.kill 'QUIT', Process.pid   end    if defined?(ActiveRecord::Base)     ActiveRecord::Base.connection.disconnect!     Rails.logger.info('Disconnected from ActiveRecord')   end end  after_fork do |server, worker|   Signal.trap 'TERM' do     puts "Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is #{Process.pid}"   end    if defined?(ActiveRecord::Base)     ActiveRecord::Base.establish_connection     Rails.logger.info('Connected to ActiveRecord')   end    Sidekiq.configure_client do |config|     config.redis = { :size => 1 }   end end 

My logs surrounding the error look like this:

Stopping all processes with SIGTERM Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is 7 Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is 11 Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is 15 Unicorn master intercepting TERM and sending myself QUIT instead. My PID is 2 Started GET "/manage" reaped #<Process::Status: pid 11 exit 0> worker=1 reaped #<Process::Status: pid 7 exit 0> worker=0 reaped #<Process::Status: pid 15 exit 0> worker=2 master complete Error R12 (Exit timeout) -> At least one process failed to exit within 10 seconds of SIGTERM Stopping remaining processes with SIGKILL Process exited with status 137 

It appears that all of the child processes were successfully reaped before the timeout. Is it possible master is still alive? Also, should the router still be sending web requests to the dyno during shut down, as shown in the logs?

FWIW, I'm using Heroku's zero downtime deployment plugin (https://devcenter.heroku.com/articles/labs-preboot/).

like image 412
middkidd Avatar asked Jul 03 '13 14:07

middkidd


1 Answers

I think your custom signal handling is what's causing the timeouts here.

EDIT: I'm getting downvoted for disagreeing with Heroku's documentation and I'd like to address this.

Configuring your Unicorn application to catch and swallow the TERM signal is the most likely cause of your application hanging and not shutting down correctly.

Heroku seems to argue that catching and transforming a TERM signal into a QUIT signal is the right behavior to turn a hard shutdown into a graceful shutdown.

However, doing this seems to introduce the risk of no shutdown at all in some cases - the root of this bug. Users experiencing hanging dynos running Unicorn should consider the evidence and make their own decision based on first principles, not just documentation.

like image 55
Winfield Avatar answered Oct 12 '22 12:10

Winfield