I’m having this peculiar problem when attempting to restart Unicorn using a USR2 signal. On a clean reboot of the VPS, I have no problems with sending a USR2 signal to Unicorn and having it gracefully restart. However, after an hour or so if I try to do it again, I will be left with an old master hanging around preventing the new master from starting. I am then forced to kill the old master so the new master can start. If I reboot the VPS, it fixes it but then after an hour so the problem starts again. I'm on Rails 4, Ruby 2.0.0.
I, [2014-01-07T15:37:37.118523 #19797] INFO -- : executing ["/srv/rails/current/bin/unicorn", "-c", "/srv/rails/current/config/unicorn.rb", {12=>#<Kgio::UNIXServer:fd 12>}] (in /srv/rails/releases/20140107091945)
I, [2014-01-07T15:37:37.118983 #19797] INFO -- : forked child re-executing...
I, [2014-01-07T15:37:38.998632 #19797] INFO -- : inherited addr=/srv/rails/shared/sockets/unicorn.sock fd=12
I, [2014-01-07T15:37:38.999038 #19797] INFO -- : Refreshing Gem list
I, [2014-01-07T15:37:41.927794 #19967] INFO -- : Refreshing Gem list
/srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:219:in `pid=': Already running on PID:19967 (or pid=/srv/rails/shared/pids/unicorn.pid is stale) (ArgumentError)
from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:151:in `start'
from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/bin/unicorn:126:in `<top (required)>'
from /srv/rails/current/bin/unicorn:16:in `load'
from /srv/rails/current/bin/unicorn:16:in `<main>'
/srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:219:in `pid=': Already running on PID:21250 (or pid=/srv/rails/shared/pids/unicorn.pid is stale) (ArgumentError)
from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:151:in `start'
from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/bin/unicorn:126:in `<top (required)>'
from /srv/rails/current/bin/unicorn:16:in `load'
from /srv/rails/current/bin/unicorn:16:in `<main>'
E, [2014-01-07T15:40:46.720131 #20878] ERROR -- : reaped #<Process::Status: pid 21075 exit 1> exec()-ed
E, [2014-01-07T15:40:46.720870 #20878] ERROR -- : master loop error: Already running on PID:21250 (or pid=/srv/rails/shared/pids/unicorn.pid is stale) (ArgumentError)
E, [2014-01-07T15:40:46.723525 #20878] ERROR -- : /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:219:in `pid='
E, [2014-01-07T15:40:46.723671 #20878] ERROR -- : /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:413:in `reap_all_workers'
E, [2014-01-07T15:40:46.723747 #20878] ERROR -- : /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:292:in `join'
E, [2014-01-07T15:40:46.723815 #20878] ERROR -- : /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/bin/unicorn:126:in `<top (required)>'
E, [2014-01-07T15:40:46.723880 #20878] ERROR -- : /srv/rails/current/bin/unicorn:16:in `load'
E, [2014-01-07T15:40:46.723930 #20878] ERROR -- : /srv/rails/current/bin/unicorn:16:in `<main>'
E, [2014-01-07T15:41:13.704700 #21250] ERROR -- : adding listener failed addr=/srv/rails/shared/sockets/unicorn.sock (in use)
E, [2014-01-07T15:41:13.704901 #21250] ERROR -- : retrying in 0.5 seconds (4 tries left)
E, [2014-01-07T15:41:14.205452 #21250] ERROR -- : adding listener failed addr=/srv/rails/shared/sockets/unicorn.sock (in use)
E, [2014-01-07T15:41:14.205597 #21250] ERROR -- : retrying in 0.5 seconds (3 tries left)
78.40.124.16, 173.245.49.122 - - [07/Jan/2014 15:41:14] "GET / HTTP/1.0" 200 28697 0.8345
E, [2014-01-07T15:41:14.706179 #21250] ERROR -- : adding listener failed addr=/srv/rails/shared/sockets/unicorn.sock (in use)
E, [2014-01-07T15:41:14.706335 #21250] ERROR -- : retrying in 0.5 seconds (2 tries left)
E, [2014-01-07T15:41:15.206834 #21250] ERROR -- : adding listener failed addr=/srv/rails/shared/sockets/unicorn.sock (in use)
E, [2014-01-07T15:41:15.206987 #21250] ERROR -- : retrying in 0.5 seconds (1 tries left)
E, [2014-01-07T15:41:15.707431 #21250] ERROR -- : adding listener failed addr=/srv/rails/shared/sockets/unicorn.sock (in use)
E, [2014-01-07T15:41:15.707563 #21250] ERROR -- : retrying in 0.5 seconds (0 tries left)
78.40.124.16, 149.154.158.74 - - [07/Jan/2014 15:41:15] "GET / HTTP/1.0" 200 32866 0.4528
E, [2014-01-07T15:41:16.208055 #21250] ERROR -- : adding listener failed addr=/srv/rails/shared/sockets/unicorn.sock (in use)
/srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/socket_helper.rb:158:in `initialize': Address already in use - "/srv/rails/shared/sockets/unicorn.sock" (Errno::EADDRINUSE)
from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/socket_helper.rb:158:in `new'
from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/socket_helper.rb:158:in `bind_listen'
from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:255:in `listen'
from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:801:in `block in bind_new_listeners!'
from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:801:in `each'
from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:801:in `bind_new_listeners!'
from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:146:in `start'
from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/bin/unicorn:126:in `<top (required)>'
from /srv/rails/current/bin/unicorn:16:in `load'
from /srv/rails/current/bin/unicorn:16:in `<main>'
deploy_path = "/srv/rails"
RAILS_ENV = ENV['RAILS_ENV'] || "production"
working_directory "#{deploy_path}/current"
pid "#{deploy_path}/shared/pids/unicorn.pid"
stderr_path "#{deploy_path}/shared/log/unicorn.log"
# Listen on a UNIX data socket
listen "#{deploy_path}/shared/sockets/unicorn.sock"
worker_processes 4
# Preload application before forking worker processes
preload_app true
# Restart any workers that haven't responded in 30 seconds
timeout 30
before_fork do |server, worker|
##
# When sent a USR2, Unicorn will suffix its pidfile with .oldbin and
# immediately start loading up a new version of itself (loaded with a new
# version of our app). When this new Unicorn is completely loaded
# it will begin spawning workers. The first worker spawned will check to
# see if an .oldbin pidfile exists. If so, this means we've just booted up
# a new Unicorn and need to tell the old one that it can now die. To do so
# we send it a QUIT.
#
# Using this method we get 0 downtime deploys.
old_pid = "#{server.config[:pid]}.oldbin"
if File.exists?(old_pid) && server.pid != old_pid
begin
Process.kill("QUIT", File.read(old_pid).to_i)
rescue Errno::ENOENT, Errno::ESRCH => e
log = File.open(Rails.root.join('log/unicorn.log'), "a")
log.puts "Error encountered when killing process:\n"
log.puts "#{e.message}"
log.close
end
end
# the following is recomended for Rails + "preload_app true"
# as there's no need for the master process to hold a connection
if defined?(ActiveRecord::Base)
ActiveRecord::Base.connection.disconnect!
end
end
after_fork do |server, worker|
##
# Unicorn master loads the app then forks off workers - because of the way
# Unix forking works, we need to make sure we aren't using any of the parent's
# sockets, e.g. db connection
ActiveRecord::Base.establish_connection
# Redis and Memcached would go here but their connections are established
# on demand, so the master never opens a socket
##
# Unicorn master is started as root, which is fine, but let's
# drop the workers to deployer
begin
uid, gid = Process.euid, Process.egid
user, group = 'deployer', 'deployer'
target_uid = Etc.getpwnam(user).uid
target_gid = Etc.getgrnam(group).gid
worker.tmp.chown(target_uid, target_gid)
if uid != target_uid || gid != target_gid
Process.initgroups(user, target_gid)
Process::GID.change_privilege(target_gid)
Process::UID.change_privilege(target_uid)
end
rescue => e
if RAILS_ENV == 'development'
STDERR.puts "couldn't change user, oh well"
else
raise e
end
end
end
require 'bundler/capistrano' # runs a bundle install --deployment
# https://github.com/sstephenson/rbenv/issues/101
set :keep_releases, 10
set :shared_children, shared_children + %w(public/images public/uploads)
# Multistage extension
set :stages, ["production", "staging"]
set :default_stage, "staging"
require 'capistrano/ext/multistage'
require 'underglow/capistrano'
# Whenever crontab updates
set :whenever_environment, defer { stage }
set :whenever_command, "bin/whenever"
require 'whenever/capistrano'
set :application, "rails"
set :user, "deployer"
default_run_options[:pty] = true
default_run_options[:shell] = '/bin/zsh'
set :use_sudo, false
# repository
set :repository, "XXXXXXXXXXXXXXXXX"
set :branch, fetch(:branch, "master") # can specify a branch from `cap -S branch="<branch_name>"`
set :scm, :git
set :scm_verbose, true
set :ssh_options, forward_agent: true
set :deploy_to, "/srv/rails"
set :deploy_via, :remote_cache
# We're using a rbenv user install, setup the PATH we need to access the rbenv shims
set :default_environment, {
'PATH' => "$HOME/.rbenv/shims:$HOME/.rbenv/bin:$PATH"
}
Has anyone seen this?
You should check the unicorn stdout/stderr logs for more evidence about why the old unicorn may be hanging or the new one failing to kill it off properly.
One gotcha is that if the older capistrano release directory has been removed during deployment of the new release, you may have bundler errors during the hot-swap handoff. Folks advise adding the following to bind to the permanent path to the Gemfile
vs. the release-specific path:
before_exec do |server|
ENV['BUNDLE_GEMFILE'] = "#{deploy_path}/current/Gemfile"
end
If that is the problem you're having, you should be seeing bundler errors or a failure in the unicorn logs.
This may not help you, but here is what I did to "fix" the problem.
I started getting this problem from the release of Unicorn 4.7.0. In 4.7.0, the behavior of how pid files were written changed, and broke my restart script. The old pre-4.7.0 behavior was: move pid file to oldpid, write new pid, start up workers, shut down master. The last step was in my unicorn.rb file of course. The new behavior was to remove the old pid quickly, and only write the new one after some heavy lifting occurred. This broke my script as it could not trust that things restarted properly. This caused my sh script to attempt to restart it, leading to confusion with the now-freshly started unicorn process and the eh-script started "full start" one. Both lost in various ways, so both exited, leaving a old master still serving requests.
I also had a defect in my unicorn.rb file which did not properly set up bundler, as someone already mentioned.
Upgrading to Unicorn 4.8.1, released recently, fixed this problem as pid files are written as they were in pre-4.7.0 days.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With