Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to monitor resque workers in New Relic when running on Heroku?

We've got an app that runs resque workers on Heroku. We've installed the New Relic add-on, and according to the docs the New Relic Agent should auto-instrument resque workers. However, we're seeing no output on the "Background Jobs" tab on the New Relic dashboard.

According to the same docs, we didn't touch the newrelic.yml file. We're neither sure what's wrong nor how to debug this effectively. What do we need to do?

like image 690
Wolfram Arnold Avatar asked Sep 19 '12 22:09

Wolfram Arnold


2 Answers

It turned out that our problem was caused by having our own custom Resque.before_fork and Resque.after_fork handlers.

NewRelic's RPM gem will automatically set up hooks with Resque.before_fork and Resque.after_fork to establish a communication channel for the workers. As a limitation of Resque, it runs only the last assigned block/Proc to the before_fork and after_fork hooks. So, if you have your own custom before_fork/after_fork hooks, you *must * set up the agent's communication channel by hand, e.g. in a config/initializers/custom_resque.rb file:

Resque.before_fork do |job|
  NewRelic::Agent.register_report_channel(job.object_id)

  # extra custom stuff here
end
  
Resque.after_fork do |job|
  NewRelic::Agent.after_fork(:report_to_channel => job.object_id)

  # extra custom stuff here
end

This code is directly taken from the RPM gem's file gems/newrelic_rpm-3.5.0/lib/new_relic/agent/instrumentation/resque.rb

RPM bug update 12/27/2012: After deploying the technique above, we found that the RPM gem leaks file handles when used in forked mode (e.g. Resque). We observed error messages of the kind ActiveRecord::StatementInvalid: ArgumentError: too large fdsets: SET client_min_messages TO ''. After a lot of digging we found that these are caused when ActiveRecord tries to open a database connection and can't because the number of file descriptors is exhausted. New Relic confirmed that there is a bug in the agent when sampling the explain plan. This occurs when lots of Resque jobs run that connect to the DB.

Bug update 1/28/2013: After much head scratching we found out that this bug was caused by an unsupported interaction with the resque-lonely_job gem which uses Resque's before_perform hook that may stop a Resque job with a Resque::Job::DontPerform exception. The RPM client doesn't clean up properly in this situation and leaks file descriptors. New Relic has been informed and is working on a fix.

Bug update 4/10/2013: This has been fixed. We're using 3.6.0.78 and it handles this case. No more file descriptor leaks! Thank you New Relic.

like image 172
Wolfram Arnold Avatar answered Sep 22 '22 15:09

Wolfram Arnold


I was having the same problem because the New Relic agent wasn't starting within my Resque workers. So I updated my resque:setup rake task to start the agent manually:

task "resque:setup" => :environment do
  if ENV['NEW_RELIC_APP_NAME']
    NewRelic::Agent.manual_start :app_name => ENV['NEW_RELIC_APP_NAME']
  end
end  
like image 33
trliner Avatar answered Sep 19 '22 15:09

trliner