Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicorn USR2 Restart Hanging Problems

I’m having this peculiar problem when attempting to restart Unicorn using a USR2 signal. On a clean reboot of the VPS, I have no problems with sending a USR2 signal to Unicorn and having it gracefully restart. However, after an hour or so if I try to do it again, I will be left with an old master hanging around preventing the new master from starting. I am then forced to kill the old master so the new master can start. If I reboot the VPS, it fixes it but then after an hour so the problem starts again. I'm on Rails 4, Ruby 2.0.0.

unicorn.log

I, [2014-01-07T15:37:37.118523 #19797]  INFO -- : executing ["/srv/rails/current/bin/unicorn", "-c", "/srv/rails/current/config/unicorn.rb", {12=>#<Kgio::UNIXServer:fd 12>}] (in /srv/rails/releases/20140107091945)
I, [2014-01-07T15:37:37.118983 #19797]  INFO -- : forked child re-executing...
I, [2014-01-07T15:37:38.998632 #19797]  INFO -- : inherited addr=/srv/rails/shared/sockets/unicorn.sock fd=12
I, [2014-01-07T15:37:38.999038 #19797]  INFO -- : Refreshing Gem list
I, [2014-01-07T15:37:41.927794 #19967]  INFO -- : Refreshing Gem list
/srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:219:in `pid=': Already running on PID:19967 (or pid=/srv/rails/shared/pids/unicorn.pid is stale) (ArgumentError)
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:151:in `start'
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/bin/unicorn:126:in `<top (required)>'
        from /srv/rails/current/bin/unicorn:16:in `load'
        from /srv/rails/current/bin/unicorn:16:in `<main>'
/srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:219:in `pid=': Already running on PID:21250 (or pid=/srv/rails/shared/pids/unicorn.pid is stale) (ArgumentError)
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:151:in `start'
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/bin/unicorn:126:in `<top (required)>'
        from /srv/rails/current/bin/unicorn:16:in `load'
        from /srv/rails/current/bin/unicorn:16:in `<main>'
E, [2014-01-07T15:40:46.720131 #20878] ERROR -- : reaped #<Process::Status: pid 21075 exit 1> exec()-ed
E, [2014-01-07T15:40:46.720870 #20878] ERROR -- : master loop error: Already running on PID:21250 (or pid=/srv/rails/shared/pids/unicorn.pid is stale) (ArgumentError)
E, [2014-01-07T15:40:46.723525 #20878] ERROR -- : /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:219:in `pid='
E, [2014-01-07T15:40:46.723671 #20878] ERROR -- : /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:413:in `reap_all_workers'
E, [2014-01-07T15:40:46.723747 #20878] ERROR -- : /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:292:in `join'
E, [2014-01-07T15:40:46.723815 #20878] ERROR -- : /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/bin/unicorn:126:in `<top (required)>'
E, [2014-01-07T15:40:46.723880 #20878] ERROR -- : /srv/rails/current/bin/unicorn:16:in `load'
E, [2014-01-07T15:40:46.723930 #20878] ERROR -- : /srv/rails/current/bin/unicorn:16:in `<main>'
E, [2014-01-07T15:41:13.704700 #21250] ERROR -- : adding listener failed addr=/srv/rails/shared/sockets/unicorn.sock (in use)
E, [2014-01-07T15:41:13.704901 #21250] ERROR -- : retrying in 0.5 seconds (4 tries left)
E, [2014-01-07T15:41:14.205452 #21250] ERROR -- : adding listener failed addr=/srv/rails/shared/sockets/unicorn.sock (in use)
E, [2014-01-07T15:41:14.205597 #21250] ERROR -- : retrying in 0.5 seconds (3 tries left)
78.40.124.16, 173.245.49.122 - - [07/Jan/2014 15:41:14] "GET / HTTP/1.0" 200 28697 0.8345
E, [2014-01-07T15:41:14.706179 #21250] ERROR -- : adding listener failed addr=/srv/rails/shared/sockets/unicorn.sock (in use)
E, [2014-01-07T15:41:14.706335 #21250] ERROR -- : retrying in 0.5 seconds (2 tries left)
E, [2014-01-07T15:41:15.206834 #21250] ERROR -- : adding listener failed addr=/srv/rails/shared/sockets/unicorn.sock (in use)
E, [2014-01-07T15:41:15.206987 #21250] ERROR -- : retrying in 0.5 seconds (1 tries left)
E, [2014-01-07T15:41:15.707431 #21250] ERROR -- : adding listener failed addr=/srv/rails/shared/sockets/unicorn.sock (in use)
E, [2014-01-07T15:41:15.707563 #21250] ERROR -- : retrying in 0.5 seconds (0 tries left)
78.40.124.16, 149.154.158.74 - - [07/Jan/2014 15:41:15] "GET / HTTP/1.0" 200 32866 0.4528
E, [2014-01-07T15:41:16.208055 #21250] ERROR -- : adding listener failed addr=/srv/rails/shared/sockets/unicorn.sock (in use)
/srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/socket_helper.rb:158:in `initialize': Address already in use - "/srv/rails/shared/sockets/unicorn.sock" (Errno::EADDRINUSE)
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/socket_helper.rb:158:in `new'
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/socket_helper.rb:158:in `bind_listen'
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:255:in `listen'
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:801:in `block in bind_new_listeners!'
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:801:in `each'
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:801:in `bind_new_listeners!'
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:146:in `start'
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/bin/unicorn:126:in `<top (required)>'
        from /srv/rails/current/bin/unicorn:16:in `load'
        from /srv/rails/current/bin/unicorn:16:in `<main>'

unicorn.rb

deploy_path = "/srv/rails"
RAILS_ENV = ENV['RAILS_ENV'] || "production"

working_directory "#{deploy_path}/current"
pid "#{deploy_path}/shared/pids/unicorn.pid"
stderr_path "#{deploy_path}/shared/log/unicorn.log"

# Listen on a UNIX data socket
listen "#{deploy_path}/shared/sockets/unicorn.sock"
worker_processes 4

# Preload application before forking worker processes
preload_app true

# Restart any workers that haven't responded in 30 seconds
timeout 30

before_fork do |server, worker|
  ##
  # When sent a USR2, Unicorn will suffix its pidfile with .oldbin and
  # immediately start loading up a new version of itself (loaded with a new
  # version of our app). When this new Unicorn is completely loaded
  # it will begin spawning workers. The first worker spawned will check to
  # see if an .oldbin pidfile exists. If so, this means we've just booted up
  # a new Unicorn and need to tell the old one that it can now die. To do so
  # we send it a QUIT.
  #
  # Using this method we get 0 downtime deploys.

  old_pid = "#{server.config[:pid]}.oldbin"

  if File.exists?(old_pid) && server.pid != old_pid
    begin
      Process.kill("QUIT", File.read(old_pid).to_i)
    rescue Errno::ENOENT, Errno::ESRCH => e
      log = File.open(Rails.root.join('log/unicorn.log'), "a")
      log.puts "Error encountered when killing process:\n"
      log.puts "#{e.message}"
      log.close
    end
  end

  # the following is recomended for Rails + "preload_app true"
  # as there's no need for the master process to hold a connection
  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.connection.disconnect!
  end
end

after_fork do |server, worker|
  ##
  # Unicorn master loads the app then forks off workers - because of the way
  # Unix forking works, we need to make sure we aren't using any of the parent's
  # sockets, e.g. db connection

  ActiveRecord::Base.establish_connection
  # Redis and Memcached would go here but their connections are established
  # on demand, so the master never opens a socket

  ##
  # Unicorn master is started as root, which is fine, but let's
  # drop the workers to deployer
  begin
    uid, gid = Process.euid, Process.egid
    user, group = 'deployer', 'deployer'
    target_uid = Etc.getpwnam(user).uid
    target_gid = Etc.getgrnam(group).gid
    worker.tmp.chown(target_uid, target_gid)
    if uid != target_uid || gid != target_gid
      Process.initgroups(user, target_gid)
      Process::GID.change_privilege(target_gid)
      Process::UID.change_privilege(target_uid)
    end
  rescue => e
    if RAILS_ENV == 'development'
      STDERR.puts "couldn't change user, oh well"
    else
      raise e
    end
  end
end

deploy.rb

require 'bundler/capistrano'    # runs a bundle install --deployment

# https://github.com/sstephenson/rbenv/issues/101
set :keep_releases, 10
set :shared_children, shared_children + %w(public/images public/uploads)

# Multistage extension
set :stages, ["production", "staging"]
set :default_stage, "staging"
require 'capistrano/ext/multistage'
require 'underglow/capistrano'

# Whenever crontab updates
set :whenever_environment, defer { stage }
set :whenever_command, "bin/whenever"
require 'whenever/capistrano'

set :application, "rails"
set :user, "deployer"

default_run_options[:pty] = true
default_run_options[:shell] = '/bin/zsh'
set :use_sudo, false

# repository
set :repository,      "XXXXXXXXXXXXXXXXX"
set :branch,          fetch(:branch, "master")  # can specify a branch from `cap -S branch="<branch_name>"`
set :scm,             :git
set :scm_verbose,     true

set :ssh_options, forward_agent: true

set :deploy_to,       "/srv/rails"
set :deploy_via,      :remote_cache

# We're using a rbenv user install, setup the PATH we need to access the rbenv shims
set :default_environment, {
  'PATH' => "$HOME/.rbenv/shims:$HOME/.rbenv/bin:$PATH"
}

Has anyone seen this?

like image 811
axsuul Avatar asked Jan 12 '14 21:01

axsuul


2 Answers

You should check the unicorn stdout/stderr logs for more evidence about why the old unicorn may be hanging or the new one failing to kill it off properly.

One gotcha is that if the older capistrano release directory has been removed during deployment of the new release, you may have bundler errors during the hot-swap handoff. Folks advise adding the following to bind to the permanent path to the Gemfile vs. the release-specific path:

before_exec do |server|
  ENV['BUNDLE_GEMFILE'] = "#{deploy_path}/current/Gemfile"
end

If that is the problem you're having, you should be seeing bundler errors or a failure in the unicorn logs.

like image 114
Winfield Avatar answered Sep 26 '22 13:09

Winfield


This may not help you, but here is what I did to "fix" the problem.

I started getting this problem from the release of Unicorn 4.7.0. In 4.7.0, the behavior of how pid files were written changed, and broke my restart script. The old pre-4.7.0 behavior was: move pid file to oldpid, write new pid, start up workers, shut down master. The last step was in my unicorn.rb file of course. The new behavior was to remove the old pid quickly, and only write the new one after some heavy lifting occurred. This broke my script as it could not trust that things restarted properly. This caused my sh script to attempt to restart it, leading to confusion with the now-freshly started unicorn process and the eh-script started "full start" one. Both lost in various ways, so both exited, leaving a old master still serving requests.

I also had a defect in my unicorn.rb file which did not properly set up bundler, as someone already mentioned.

Upgrading to Unicorn 4.8.1, released recently, fixed this problem as pid files are written as they were in pre-4.7.0 days.

like image 40
Michael Graff Avatar answered Sep 22 '22 13:09

Michael Graff