Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

delayed_job stops running after some time in production

In production, our delayed_job process is dying for some reason. I'm not sure if it's crashing or being killed by the operating system or what. I don't see any errors in the delayed_job.log file.

What can I do to troubleshoot this? I was thinking of installing monit to monitor it, but that will only tell me precisely when it dies. It won't really tell me why it died.

Is there a way to make it more chatty to the log file, so I can tell why it might be dying?

Any other suggestions?

like image 942
Joelio Avatar asked Dec 02 '10 00:12

Joelio


1 Answers

I've come across two causes of delayed_job failing silently. The first is actual segfaults when people were using libxml in forked processes (this popped up on the mailing list some time back).

The second is an issue to do with the 1.1.0 version of daemons that delayed_job relies on has a problem (https://github.com/collectiveidea/delayed_job/issues#issue/81), this can be easily worked around by using 1.0.10 which is what my own Gemfile has in it.

Logging

There is logging in delayed_job so if the worker is dying without printing an error it's usually because it's not throwing an exception (e.g. Segfault) or something external is killing the process.

Monitoring

I use bluepill to monitor my delayed job instances, and so far this has been very successful at ensuring that the jobs remain running. The steps to get bluepill running for an application are quite easy

Add the bluepill gem to your Gemfile:

 # Monitoring
  gem 'i18n' # Not sure why but it complained I didn't have it
  gem 'bluepill'

I created a bluepill config file:

app_home = "/home/mi/production"
workers = 5
Bluepill.application("mi_delayed_job", :log_file => "#{app_home}/shared/log/bluepill.log") do |app|
  (0...workers).each do |i|
    app.process("delayed_job.#{i}") do |process|
      process.working_dir = "#{app_home}/current"

      process.start_grace_time    = 10.seconds
      process.stop_grace_time     = 10.seconds
      process.restart_grace_time  = 10.seconds

      process.start_command = "cd #{app_home}/current && RAILS_ENV=production ruby script/delayed_job start -i #{i}"
      process.stop_command  = "cd #{app_home}/current && RAILS_ENV=production ruby script/delayed_job stop -i #{i}"

      process.pid_file = "#{app_home}/shared/pids/delayed_job.#{i}.pid"
      process.uid = "mi"
      process.gid = "mi"
    end
  end
end

Then in my capistrano deploy file I just added:

# Bluepill related tasks
after "deploy:update", "bluepill:quit", "bluepill:start"
namespace :bluepill do
  desc "Stop processes that bluepill is monitoring and quit bluepill"
  task :quit, :roles => [:app] do
    run "cd #{current_path} && bundle exec bluepill --no-privileged stop"
    run "cd #{current_path} && bundle exec bluepill --no-privileged quit"
  end

  desc "Load bluepill configuration and start it"
  task :start, :roles => [:app] do
    run "cd #{current_path} && bundle exec bluepill --no-privileged load /home/mi/production/current/config/delayed_job.bluepill"
  end

  desc "Prints bluepills monitored processes statuses"
  task :status, :roles => [:app] do
    run "cd #{current_path} && bundle exec bluepill --no-privileged status"
  end
end

Hope this helps a little.

like image 199
Luke Chadwick Avatar answered Oct 06 '22 01:10

Luke Chadwick