There are essentially 3 problems here:
1) Unicorn seems to be steadily filling up all the RAM, causing me to remove workers manually.
2) Unicorn seems to be spawning additional workers for some reason, although I have specified a fixed number of workers (7 of them). This is partly causing the RAM buildup, which also is causing me to remove workers manually.
3) Zero downtime deployment is unreliable in my case. Sometimes it picks up the changes, sometimes I get gateway timeouts. Each deploy becomes a very stressful situation.
I don't really like using Monit, because it kills workers without waiting for workers to finish serving their requests.
So, is this normal? Do other people who deploy using Unicorn have the same problem where the RAM just grows uncontrollably?
And also where workers the number of workers spawned do not match the number of workers defined?
The other alternative is unicorn worker killer, which I would be trying out after reading Unicorn Eating Memory.
Tiny Update:
So it came to a point where New Relic was telling me the memory was almost 95%. So I had to kill a worker. Interestingly, killing that worker brought the memory down by quite a lot, as seen from the graph below.
What's up with that?
For reference, here's my unicorn.rb
and unicorn_init.sh
. Would love for somebody to tell me that there's a mistake in there somewhere.
unicorn.rb
root = "/home/deployer/apps/myapp/current"
working_directory root
pid "#{root}/tmp/pids/unicorn.pid"
stderr_path "#{root}/log/unicorn.stderr.log"
stdout_path "#{root}/log/unicorn.log"
listen "/tmp/unicorn.myapp.sock"
worker_processes 7
timeout 30
preload_app true
before_exec do |_|
ENV["BUNDLE_GEMFILE"] = '/home/deployer/apps/myapp/current/Gemfile'
end
before_fork do |server, worker|
# Disconnect since the database connection will not carry over
if defined? ActiveRecord::Base
ActiveRecord::Base.connection.disconnect!
end
old_pid = "#{root}/tmp/pids/unicorn.pid.oldbin`"
if old_pid != server.pid
begin
sig = (worker.nr + 1) >= server.worker_processes ? :QUIT : :TTOU
Process.kill(sig, File.read(old_pid).to_i)
rescue Errno::ENOENT, Errno::ESRCH
end
end
sleep 1
end
after_fork do |server, worker|
# Start up the database connection again in the worker
if defined?(ActiveRecord::Base)
ActiveRecord::Base.establish_connection
end
Redis.current.quit
Rails.cache.reconnect
end
unicorn_init.sh
#!/bin/sh
set -e
# Feel free to change any of the following variables for your app:
TIMEOUT=${TIMEOUT-60}
APP_ROOT=/home/deployer/apps/myapp/current
PID=$APP_ROOT/tmp/pids/unicorn.pid
CMD="cd $APP_ROOT; BUNDLE_GEMFILE=/home/deployer/apps/myapp/current/Gemfile bundle exec unicorn -D -c $APP_ROOT/config/unicorn.rb -E production"
AS_USER=deployer
set -u
OLD_PIN="$PID.oldbin"
sig () {
test -s "$PID" && kill -$1 `cat $PID`
}
oldsig () {
test -s $OLD_PIN && kill -$1 `cat $OLD_PIN`
}
run () {
if [ "$(id -un)" = "$AS_USER" ]; then
eval $1
else
su -c "$1" - $AS_USER
fi
}
case "$1" in
start)
sig 0 && echo >&2 "Already running" && exit 0
run "$CMD"
;;
stop)
sig QUIT && exit 0
echo >&2 "Not running"
;;
force-stop)
sig TERM && exit 0
echo >&2 "Not running"
;;
restart|reload)
sig USR2 && echo reloaded OK && exit 0
echo >&2 "Couldn't reload, starting '$CMD' instead"
run "$CMD"
;;
upgrade)
if sig USR2 && sleep 2 && sig 0 && oldsig QUIT
then
n=$TIMEOUT
while test -s $OLD_PIN && test $n -ge 0
do
printf '.' && sleep 1 && n=$(( $n - 1 ))
done
echo
if test $n -lt 0 && test -s $OLD_PIN
then
echo >&2 "$OLD_PIN still exists after $TIMEOUT seconds"
exit 1
fi
exit 0
fi
echo >&2 "Couldn't upgrade, starting '$CMD' instead"
run "$CMD"
;;
reopen-logs)
sig USR1
;;
*)
echo >&2 "Usage: $0 <start|stop|restart|upgrade|force-stop|reopen-logs>"
exit 1
;;
esac
You appear to have two problems: 1) You have errors in the coordination of graceful restart causing old unicorn workers and the old master to stick around; 2) Your app (not unicorn) is leaking memory.
For the former, looking at your before_fork
code, it appears you're using the memory-constraining approach from the example config However, you have a typo in the .oldbin
file name (an extraneous back-tick at the end) which means you never signal the old process because you can't read the pid from a non-existent file.
For the later, you will have to investigate and drill. Look in your app for caching semantics that accumulate data over time; examine carefully all use of globals, class-vars, and class-instance-vars which can retain data references from request to request. Run some memory profiles to characterize your memory use. You can mitigate memory leakage by killing workers when they grow bigger than some upper limit; unicorn-worker-killer makes this easy.
Use unicorn-worker-killer, this makes it easier to kill workers who consume a lot of RAMs :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With