I need help understanding how Bull Queue (bull.js) processes concurrent jobs.
Suppose I have 10 Node.js instances that each instantiate a Bull Queue connected to the same Redis instance:
const bullQueue = require('bull');
const queue = new bullQueue('taskqueue', {...})
const concurrency = 5;
queue.process('jobTypeA', concurrency, job => {...do something...});
Does this mean that globally across all 10 node instances there will be a maximum of 5 (concurrency) concurrently running jobs of type jobTypeA
? Or am I misunderstanding and the concurrency setting is per-Node instance?
What happens if one Node instance specifies a different concurrency value?
Can I be certain that jobs will not be processed by more than one Node instance?
The TL;DR is: under normal conditions, jobs are being processed only once. If things go wrong (say Node.js process crashes), jobs may be double processed.
Quoting from Bull's official README.md:
Important Notes
The queue aims for an "at least once" working strategy. This means that in some situations, a job could be processed more than once. This mostly happens when a worker fails to keep a lock for a given job during the total duration of the processing.
When a worker is processing a job it will keep the job "locked" so other workers can't process it.
It's important to understand how locking works to prevent your jobs from losing their lock - becoming stalled - and being restarted as a result. Locking is implemented internally by creating a lock for
lockDuration
on intervallockRenewTime
(which is usually halflockDuration
). IflockDuration
elapses before the lock can be renewed, the job will be considered stalled and is automatically restarted; it will be double processed. This can happen when:
- The Node process running your job processor unexpectedly terminates.
- Your job processor was too CPU-intensive and stalled the Node event loop, and as a result, Bull couldn't renew the job lock (see #488 for how we might better detect this). You can fix this by breaking your job processor into smaller parts so that no single part can block the Node event loop. Alternatively, you can pass a larger value for the lockDuration setting (with the tradeoff being that it will take longer to recognize a real stalled job).
As such, you should always listen for the
stalled
event and log this to your error monitoring system, as this means your jobs are likely getting double-processed.As a safeguard so problematic jobs won't get restarted indefinitely (e.g. if the job processor aways crashes its Node process), jobs will be recovered from a stalled state a maximum of
maxStalledCount
times (default:1
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With