Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does delayed_jobs loop when it errors?

I have had several occassions where delayed_jobs will be taking care of delivering 4 thousand emails, and if something errors in those emails ( for instance, if there's a nil a call ), then it will restart the entire process and attempt to send yet another 4000 emails. This will go on indefinately until I manually kill -9 the entire process.

It's happened so many times to me over the past few years in different applications, I'm curious if it has happened to anyone else. And what they did to overcome it.

like image 922
Trip Avatar asked Jun 10 '11 15:06

Trip


2 Answers

A delayed job is usually just a method that gets executed by a worker in a background process as opposed to during the main thread of your application (request lifecycle for a Rails app).

If you read the documentation for delayed_job under "Gory Details" it states:

On failure, the job is scheduled again in 5 seconds + N ** 4, where N is the number of retries.

The default Worker.max_attempts is 25. After this, the job either deleted (default), or left in the database with “failed_at” set. With the default of 25 attempts, the last retry will be 20 days later, with the last interval being almost 100 hours.

It sounds like what you are describing is the way delayed_jobs was intended to function - if a job to send 4,000 emails fails after sending 3,000 of them, it just starts over again. You will probably need to keep track of what has and hasn't been sent, so your job can loop over "unsent" emails (or whatever information is appropriate for your background process). That way when you send 3,000 emails, they get marked as "sent" and if your job fails, it will start over by loading the remaining 1,000 "unsent" emails and attempting to send them.

If you really don't want jobs to retry themselves on failure, you can add the following code to your project:

# config/initializers/delayed_job_config.rb
Delayed::Worker.max_attempts = 1
like image 198
Brett Bender Avatar answered Nov 02 '22 23:11

Brett Bender


We have a rule with delayed jobs for this exact reason - each job must be atomic. If the job fails for some reason (an exception, network error, etc) then the there must be no side-effects.

For jobs that just modify the database, the solution is easy - wrap the job in a transaction.

For jobs that interact with external services (sending email, hitting an API, etc) then we try to break each task into a separate job.

In your case we would create 4000 jobs, 1 to send each email. If some of them fail then they'll keep retrying without sending emails to everyone else over and over.

like image 38
James Healy Avatar answered Nov 03 '22 00:11

James Healy