Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does celery needs a message broker?

As celery is a job queue/task queue, name illustrates that it can maintain its tasks and process them. Then why does it needs a message broker like rabbitmq or redis?

like image 957
Shailendra Avatar asked Jun 17 '18 05:06

Shailendra


1 Answers

Celery is a Distributed Task Queue that means that the system can reside across multiple computers (containers) across multiple locations with a single centralise bus

the basic architecture is as follows:

workers - processes that can take jobs (data) from the bus (task queue) and process it

*it can put the result back into the bus for farther processing by a different worker (create a processing flow)

bus - task queue, this is basically a db that store the jobs as messages, so the workers can retrieve them,

it's important to implement a concurrent and non blocking db, so when one process takes or puts job from/on the bus, it doesn't block other workers from getting/putting theirs jobs.

RabbitMQ, Redis, ActiveMQ Kafka and such are best candidates for this sort of behaviour

the bus has an api which let to submit jobs for workers and retrieve them (among more complex features)

most buses implement an ack/fail feature so workers can ack their job being done or if not ack (or report failure) this message can be served again to another worker, and might get processed successfully this time, thus no data is lost...(this depends highly on the fail over logic and the context of data as an input to a task)

Celery include a scheduler (beat) that periodically put specific jobs on the bus and thus create a periodically tasks

lets work with a scrapping example, you want to scrap the world, but china can only allow traffic from it's region and so is Europe and the USA so you can build a workers and place them all over the world

you can use only one bus, lets say it's located in the usa, all other workers know this bus and can connect to it, so by placing a specific job (scrap china) on the bus located in the US, a process in china can work on it, hence distributed

of course, workers will increase the throughput of the system, only due to parallelism, unrelated to their geo location and this is the common case of using an event-driven architecture (i.e central bus, consumers and producers)

I suggest read the formal docs, it's pretty straight forward

like image 147
shahaf Avatar answered Sep 25 '22 09:09

shahaf