Recovering cleanly from Resque::TermException or SIGTERM on Heroku

Question

When we restart or deploy we get a number of Resque jobs in the failed queue with either Resque::TermException (SIGTERM) or Resque::DirtyExit.

We're using the new TERM_CHILD=1 RESQUE_TERM_TIMEOUT=10 in our Procfile so our worker line looks like:

worker:  TERM_CHILD=1 RESQUE_TERM_TIMEOUT=10 bundle exec rake environment resque:work QUEUE=critical,high,low

We're also using resque-retry which I thought might auto-retry on these two exceptions? But it seems to not be.

So I guess two questions:

We could manually rescue from Resque::TermException in each job, and use this to reschedule the job. But is there a clean way to do this for all jobs? Even a monkey patch.
Shouldn't resque-retry auto retry these? Can you think of any reason why it wouldn't be?

Thanks!

Edit: Getting all jobs to complete in less than 10 seconds seems unreasonable at scale. It seems like there needs to be a way to automatically re-queue these jobs when the Resque::DirtyExit exception is run.

iloveitaly · Accepted Answer

I ran into this issue as well. It turns out that Heroku sends the SIGTERM signal to not just the parent process but all forked processes. This is not the logic that Resque expects which causes the RESQUE_PRE_SHUTDOWN_TIMEOUT to be skipped, forcing jobs to executed without any time to attempt to finish a job.

Heroku gives workers 30s to gracefully shutdown after a SIGTERM is issued. In most cases, this is plenty of time to finish a job with some buffer time left over to requeue the job to Resque if the job couldn't finish. However, for all of this time to be used you need to set the RESQUE_PRE_SHUTDOWN_TIMEOUT and RESQUE_TERM_TIMEOUT env vars as well as patch Resque to correctly respond to SIGTERM being sent to forked processes.

Here's a gem which patches resque and explains this issue in more detail:

https://github.com/iloveitaly/resque-heroku-signals

Recovering cleanly from Resque::TermException or SIGTERM on Heroku

Tags:

heroku

resque

resque-retry

Brian Armstrong

1 Answers

iloveitaly

Recent Activity

Donate For Us

Recovering cleanly from Resque::TermException or SIGTERM on Heroku

Tags:

heroku

resque

resque-retry

Brian Armstrong

1 Answers

iloveitaly

Related questions

Recent Activity

Donate For Us