I have a sidekiq worker that shouldn't take more than 30 seconds, but after a few days I'll find that the entire worker queue stops executing because all of the workers are locked up.
Here is my worker:
class MyWorker
include Sidekiq::Worker
include Sidekiq::Status::Worker
sidekiq_options queue: :my_queue, retry: 5, timeout: 4.minutes
sidekiq_retry_in do |count|
5
end
sidekiq_retries_exhausted do |msg|
store({message: "Gave up."})
end
def perform(id)
begin
Timeout::timeout(3.minutes) do
got_lock = with_semaphore("lock_#{id}") do
# DO WORK
end
end
rescue ActiveRecord::RecordNotFound => e
# Handle
rescue Timeout::Error => e
# Handle
raise e
end
end
def with_semaphore(name, &block)
Semaphore.get(name, {stale_client_timeout: 1.minute}).lock(1, &block)
end
end
And the semaphore class we use. (redis-semaphore gem)
class Semaphore
def self.get(name, options = {})
Redis::Semaphore.new(name.to_sym,
:redis => Application.redis,
stale_client_timeout: options[:stale_client_timeout] || 1.hour,
)
end
end
Basically I'll stop the worker and it will state done: 10000 seconds, which the worker should NEVER be running for.
Anyone have any ideas on how to fix this or what is causing it? The workers are running on EngineYard.
Edit: One additional comment. The # DO WORK has a chance to fire off a PostgresSQL function. I have noticed in logs some mention of PG::TRDeadlockDetected: ERROR: deadlock detected. Would this cause the worker to never complete even with a timeout set?
Given you want to ensure unique job execution, i would attempt removing all locks and delegate job uniqueness control to a plugin like Sidekiq Unique Jobs
In this case, even if sidetiq enqueue the same job id twice, this plugin ensures it will be enqueued/processed a single time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With