Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Capistrano not restarting Mongrel clusters properly

I have a cluster of three mongrels running under nginx, and I deploy the app using Capistrano 2.4.3. When I "cap deploy" when there is a running system, the behavior is:

  1. The app is deployed. The code is successfully updated.
  2. In the cap deploy output, there is this:

    • executing "sudo -p 'sudo password: ' mongrel_rails cluster::restart -C /var/www/rails/myapp/current/config/mongrel_cluster.yml"
    • servers: ["myip"]
    • [myip] executing command
    • ** [out :: myip] stopping port 9096
    • ** [out :: myip] stopping port 9097
    • ** [out :: myip] stopping port 9098
    • ** [out :: myip] already started port 9096
    • ** [out :: myip] already started port 9097
    • ** [out :: myip] already started port 9098
  3. I check immediately on the server and find that Mongrel is still running, and the PID files are still present for the previous three instances.
  4. A short time later (less than one minute), I find that Mongrel is no longer running, the PID files are gone, and it has failed to restart.
  5. If I start mongrel on the server by hand, the app starts up just fine.

It seems like 'mongrel_rails cluster::restart' isn't properly waiting for a full stop before attempting a restart of the cluster. How do I diagnose and fix this issue?

EDIT: Here's the answer:

mongrel_cluster, in the "restart" task, simply does this:

 def run
   stop
   start
 end

It doesn't do any waiting or checking to see that the process exited before invoking "start". This is a known bug with an outstanding patch submitted. I applied the patch to Mongrel Cluster and the problem disappeared.

like image 240
Pete Avatar asked Sep 30 '08 21:09

Pete


2 Answers

You can explicitly tell the mongrel_cluster recipes to remove the pid files before a start by adding the following in your capistrano recipes:

# helps keep mongrel pid files clean
set :mongrel_clean, true

This causes it to pass the --clean option to mongrel_cluster_ctl.

I went back and looked at one of my deployment recipes and noticed that I had also changed the way my restart task worked. Take a look at the following message in the mongrel users group:

mongrel users discussion of restart

The following is my deploy:restart task. I admit it's a bit of a hack.

namespace :deploy do
  desc "Restart the Mongrel processes on the app server."
  task :restart, :roles => :app do
    mongrel.cluster.stop
    sleep 2.5
    mongrel.cluster.start
  end
end
like image 160
rwc9u Avatar answered Sep 30 '22 05:09

rwc9u


First, narrow the scope of what your testing by only calling cap deploy:restart. You might want to pass the --debug option to prompt before remote execution or the --dry-run option just to see what's going on as you tweak your settings.

At first glance, this sounds like a permissions issue on the pid files or mongrel processes, but it's difficult to know for sure. A couple things that catch my eye are:

  • the :runner variable is explicity set to nil -- Was there a specific reason for this?
  • Capistrano 2.4 introduced a new behavior for the :admin_runner variable. Without seeing the entire recipe, is this possibly related to your problem?

    :runner vs. :admin_runner (from capistrano 2.4 release) Some cappers have noted that having deploy:setup and deploy:cleanup run as the :runner user messed up their carefully crafted permissions. I agreed that this was a problem. With this release, deploy:start, deploy:stop, and deploy:restart all continue to use the :runner user when sudoing, but deploy:setup and deploy:cleanup will use the :admin_runner user. The :admin_runner variable is unset, by default, meaning those tasks will sudo as root, but if you want them to run as :runner, just do “set :admin_runner, runner”.

My recommendation for what to do next. Manually stop the mongrels and clean up the PIDs. Start the mongrels manually. Next, continue to run cap deploy:restart while debugging the problem. Repeat as necessary.

like image 36
Ryan McGeary Avatar answered Sep 30 '22 06:09

Ryan McGeary