Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicorn Memory Usage filling up almost all the RAM

New Relic Process snapshot

There are essentially 3 problems here:

1) Unicorn seems to be steadily filling up all the RAM, causing me to remove workers manually.

2) Unicorn seems to be spawning additional workers for some reason, although I have specified a fixed number of workers (7 of them). This is partly causing the RAM buildup, which also is causing me to remove workers manually.

3) Zero downtime deployment is unreliable in my case. Sometimes it picks up the changes, sometimes I get gateway timeouts. Each deploy becomes a very stressful situation.

I don't really like using Monit, because it kills workers without waiting for workers to finish serving their requests.

So, is this normal? Do other people who deploy using Unicorn have the same problem where the RAM just grows uncontrollably?

And also where workers the number of workers spawned do not match the number of workers defined?

The other alternative is unicorn worker killer, which I would be trying out after reading Unicorn Eating Memory.

Tiny Update:

enter image description here

So it came to a point where New Relic was telling me the memory was almost 95%. So I had to kill a worker. Interestingly, killing that worker brought the memory down by quite a lot, as seen from the graph below.

What's up with that?

For reference, here's my unicorn.rb and unicorn_init.sh. Would love for somebody to tell me that there's a mistake in there somewhere.

unicorn.rb

root = "/home/deployer/apps/myapp/current"
working_directory root
pid "#{root}/tmp/pids/unicorn.pid"
stderr_path "#{root}/log/unicorn.stderr.log"
stdout_path "#{root}/log/unicorn.log"

listen "/tmp/unicorn.myapp.sock"
worker_processes 7
timeout 30

preload_app true

before_exec do |_|
  ENV["BUNDLE_GEMFILE"] = '/home/deployer/apps/myapp/current/Gemfile'
end

before_fork do |server, worker|
  # Disconnect since the database connection will not carry over
  if defined? ActiveRecord::Base
    ActiveRecord::Base.connection.disconnect!
  end

  old_pid = "#{root}/tmp/pids/unicorn.pid.oldbin`"
  if old_pid != server.pid
    begin
      sig = (worker.nr + 1) >= server.worker_processes ? :QUIT : :TTOU
      Process.kill(sig, File.read(old_pid).to_i)
    rescue Errno::ENOENT, Errno::ESRCH
    end
  end
  sleep 1
end

after_fork do |server, worker|
  # Start up the database connection again in the worker
  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.establish_connection
  end

  Redis.current.quit
  Rails.cache.reconnect
end

unicorn_init.sh

#!/bin/sh
set -e

# Feel free to change any of the following variables for your app:
TIMEOUT=${TIMEOUT-60}
APP_ROOT=/home/deployer/apps/myapp/current
PID=$APP_ROOT/tmp/pids/unicorn.pid
CMD="cd $APP_ROOT; BUNDLE_GEMFILE=/home/deployer/apps/myapp/current/Gemfile bundle exec unicorn -D -c $APP_ROOT/config/unicorn.rb -E production"
AS_USER=deployer
set -u
OLD_PIN="$PID.oldbin"

sig () {
  test -s "$PID" && kill -$1 `cat $PID`
}

oldsig () {
  test -s $OLD_PIN && kill -$1 `cat $OLD_PIN`
}

run () {
  if [ "$(id -un)" = "$AS_USER" ]; then
    eval $1
  else
    su -c "$1" - $AS_USER
  fi
}

case "$1" in
start)
  sig 0 && echo >&2 "Already running" && exit 0
  run "$CMD"
  ;;
stop)
  sig QUIT && exit 0
  echo >&2 "Not running"
  ;;
force-stop)
  sig TERM && exit 0
  echo >&2 "Not running"
  ;;
restart|reload)
  sig USR2 && echo reloaded OK && exit 0
  echo >&2 "Couldn't reload, starting '$CMD' instead"
  run "$CMD"
  ;;
upgrade)
  if sig USR2 && sleep 2 && sig 0 && oldsig QUIT
  then
    n=$TIMEOUT
    while test -s $OLD_PIN && test $n -ge 0
    do
      printf '.' && sleep 1 && n=$(( $n - 1 ))
    done
    echo

    if test $n -lt 0 && test -s $OLD_PIN
    then
      echo >&2 "$OLD_PIN still exists after $TIMEOUT seconds"
      exit 1
    fi
    exit 0
  fi
  echo >&2 "Couldn't upgrade, starting '$CMD' instead"
  run "$CMD"
  ;;
reopen-logs)
  sig USR1
  ;;
*)
  echo >&2 "Usage: $0 <start|stop|restart|upgrade|force-stop|reopen-logs>"
  exit 1
  ;;
esac
like image 534
Benjamin Tan Wei Hao Avatar asked Aug 10 '13 03:08

Benjamin Tan Wei Hao


2 Answers

You appear to have two problems: 1) You have errors in the coordination of graceful restart causing old unicorn workers and the old master to stick around; 2) Your app (not unicorn) is leaking memory.

For the former, looking at your before_fork code, it appears you're using the memory-constraining approach from the example config However, you have a typo in the .oldbin file name (an extraneous back-tick at the end) which means you never signal the old process because you can't read the pid from a non-existent file.

For the later, you will have to investigate and drill. Look in your app for caching semantics that accumulate data over time; examine carefully all use of globals, class-vars, and class-instance-vars which can retain data references from request to request. Run some memory profiles to characterize your memory use. You can mitigate memory leakage by killing workers when they grow bigger than some upper limit; unicorn-worker-killer makes this easy.

like image 185
dbenhur Avatar answered Nov 07 '22 12:11

dbenhur


Use unicorn-worker-killer, this makes it easier to kill workers who consume a lot of RAMs :)

like image 26
Kazuki Ohta Avatar answered Nov 07 '22 13:11

Kazuki Ohta