We've got an app that runs resque workers on Heroku. We've installed the New Relic add-on, and according to the docs the New Relic Agent should auto-instrument resque workers. However, we're seeing no output on the "Background Jobs" tab on the New Relic dashboard.
According to the same docs, we didn't touch the newrelic.yml
file. We're neither sure what's wrong nor how to debug this effectively. What do we need to do?
It turned out that our problem was caused by having our own custom Resque.before_fork
and Resque.after_fork
handlers.
NewRelic's RPM gem will automatically set up hooks with Resque.before_fork
and Resque.after_fork
to establish a communication channel for the workers. As a limitation of Resque, it runs only the last assigned block/Proc to the before_fork and after_fork hooks. So, if you have your own custom before_fork/after_fork hooks, you *must * set up the agent's communication channel by hand, e.g. in a config/initializers/custom_resque.rb file:
Resque.before_fork do |job|
NewRelic::Agent.register_report_channel(job.object_id)
# extra custom stuff here
end
Resque.after_fork do |job|
NewRelic::Agent.after_fork(:report_to_channel => job.object_id)
# extra custom stuff here
end
This code is directly taken from the RPM gem's file gems/newrelic_rpm-3.5.0/lib/new_relic/agent/instrumentation/resque.rb
RPM bug update 12/27/2012: After deploying the technique above, we found that the RPM gem leaks file handles when used in forked mode (e.g. Resque). We observed error messages of the kind ActiveRecord::StatementInvalid: ArgumentError: too large fdsets: SET client_min_messages TO ''
. After a lot of digging we found that these are caused when ActiveRecord tries to open a database connection and can't because the number of file descriptors is exhausted. New Relic confirmed that there is a bug in the agent when sampling the explain plan. This occurs when lots of Resque jobs run that connect to the DB.
Bug update 1/28/2013: After much head scratching we found out that this bug was caused by an unsupported interaction with the resque-lonely_job gem which uses Resque's before_perform
hook that may stop a Resque job with a Resque::Job::DontPerform
exception. The RPM client doesn't clean up properly in this situation and leaks file descriptors. New Relic has been informed and is working on a fix.
Bug update 4/10/2013: This has been fixed. We're using 3.6.0.78 and it handles this case. No more file descriptor leaks! Thank you New Relic.
I was having the same problem because the New Relic agent wasn't starting within my Resque workers. So I updated my resque:setup
rake task to start the agent manually:
task "resque:setup" => :environment do
if ENV['NEW_RELIC_APP_NAME']
NewRelic::Agent.manual_start :app_name => ENV['NEW_RELIC_APP_NAME']
end
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With