In production, our delayed_job
process is dying for some reason. I'm not sure if it's crashing or being killed by the operating system or what. I don't see any errors in the delayed_job.log
file.
What can I do to troubleshoot this? I was thinking of installing monit to monitor it, but that will only tell me precisely when it dies. It won't really tell me why it died.
Is there a way to make it more chatty to the log file, so I can tell why it might be dying?
Any other suggestions?
I've come across two causes of delayed_job failing silently. The first is actual segfaults when people were using libxml in forked processes (this popped up on the mailing list some time back).
The second is an issue to do with the 1.1.0 version of daemons that delayed_job relies on has a problem (https://github.com/collectiveidea/delayed_job/issues#issue/81), this can be easily worked around by using 1.0.10 which is what my own Gemfile has in it.
There is logging in delayed_job so if the worker is dying without printing an error it's usually because it's not throwing an exception (e.g. Segfault) or something external is killing the process.
I use bluepill to monitor my delayed job instances, and so far this has been very successful at ensuring that the jobs remain running. The steps to get bluepill running for an application are quite easy
Add the bluepill gem to your Gemfile:
# Monitoring
gem 'i18n' # Not sure why but it complained I didn't have it
gem 'bluepill'
I created a bluepill config file:
app_home = "/home/mi/production"
workers = 5
Bluepill.application("mi_delayed_job", :log_file => "#{app_home}/shared/log/bluepill.log") do |app|
(0...workers).each do |i|
app.process("delayed_job.#{i}") do |process|
process.working_dir = "#{app_home}/current"
process.start_grace_time = 10.seconds
process.stop_grace_time = 10.seconds
process.restart_grace_time = 10.seconds
process.start_command = "cd #{app_home}/current && RAILS_ENV=production ruby script/delayed_job start -i #{i}"
process.stop_command = "cd #{app_home}/current && RAILS_ENV=production ruby script/delayed_job stop -i #{i}"
process.pid_file = "#{app_home}/shared/pids/delayed_job.#{i}.pid"
process.uid = "mi"
process.gid = "mi"
end
end
end
Then in my capistrano deploy file I just added:
# Bluepill related tasks
after "deploy:update", "bluepill:quit", "bluepill:start"
namespace :bluepill do
desc "Stop processes that bluepill is monitoring and quit bluepill"
task :quit, :roles => [:app] do
run "cd #{current_path} && bundle exec bluepill --no-privileged stop"
run "cd #{current_path} && bundle exec bluepill --no-privileged quit"
end
desc "Load bluepill configuration and start it"
task :start, :roles => [:app] do
run "cd #{current_path} && bundle exec bluepill --no-privileged load /home/mi/production/current/config/delayed_job.bluepill"
end
desc "Prints bluepills monitored processes statuses"
task :status, :roles => [:app] do
run "cd #{current_path} && bundle exec bluepill --no-privileged status"
end
end
Hope this helps a little.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With