I have recently started working on distributed computing for increasing the computation speed. I opted for Celery. However, I am not very familiar with some terms. So, I have several related questions.
From the Celery docs:
What's a Task Queue?
...
Celery communicates via messages, usually using a broker to mediate between clients and workers. To initiate a task the client adds a message to the queue, the broker then delivers that message to a worker.
What are clients (here)? What is a broker? Why are messages delivered through a broker? Why would Celery use a backend and queues for interprocess communication?
When I execute the Celery console by issuing the command
celery worker -A tasks --loglevel=info --concurrency 5
Does this mean that the Celery console is a worker process which is in charge of 5 different processes and keeps track of the task queue? When a new task is pushed into the task queue, does this worker assign the task/job to any of the 5 processes?
Last question first:
celery worker -A tasks --loglevel=info --concurrency 5
You are correct - the worker controls 5 processes. The worker distributes tasks among the 5 processes.
A "client" is any code that runs celery tasks asynchronously.
There are 2 different types of communication - when you run apply_async
you send a task request to a broker (most commonly rabbitmq) - this is basically a set of message queues.
When the workers finish they put their results into the result backend.
The broker and results backend are quite separate and require different kinds of software to function optimally.
You can use RabbitMQ for both, but once you reach a certain rate of messages it will not work properly. The most common combination is RabbitMQ for broker and Redis for results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With