Is it the actual processor in which Celery is running or is it another process? In flower, I could see multiple processes in a worker pool? What are the differences between these two?
When you run a celery worker, it creates one parent process to manage the running tasks. This process handles the book keeping features like sending/receiving queue messages, registering tasks, killing hung tasks, tracking status, etc.
That process then spawns N number of child worker processes that actually execute the individual tasks. The number is determined by the -c
argument when starting the worker:
http://docs.celeryproject.org/en/latest/userguide/workers.html#concurrency
The child processes can be implemented using a number of strategies and is configured when starting the worker with the -P
argument. Possible values include: prefork, eventlet, gevent, threads/solo.
It turns out that Celery nodes are indirectly documented here:
https://docs.celeryproject.org/en/latest/reference/celery.bin.multi.html#celery.bin.multi.MultiTool.MultiParser.Node
In short Celery uses a set of terms that are useful to understand when building a system of distributed work.
Terms around those that help to plan things include:
At this point, take note that the Client, Broker and Worker can all be on different machines, and there can in fact be multiple Clients on different machines and multiple Workers on different machines, as long as they use the same Broker.
It should be no surprise then, that the application typically has the Broker configured with a URL. That is all the Applications, in all the Clients and Workers are all using the same Broker URL and hence are all using the same Broker.
The Clients send (produce) messages via the broker, requesting tasks to run, the Workers read (consume) those messages.
Now these terms all have a place:
Each Worker can process multiple task at once, by maintaining an execution pool. This pool might be threads, or (by default) it is subprocesses. So a Worker may have a number of Pool processes as children.
One of the frustrations (I have) with Celery is that you can communicate liberally with Workers but not with the running tasks in a Worker's Execution Pool (for which reason I am creating a new Task class for interactive tasks, but it's still evolving).
A Node is just a Worker in a Cluster. In short Node = Worker. A Cluster is a number of Workers running in parallel (using celery multi
as per the document I introduced with). A Cluster is just a convenient way of starting and stopping and managing multiple workers on the same machine.
There may be many Clusters all consuming tasks from the same Broker though and they may be on the same machine (though one would wonder why) or on different machines.
And that is what a Celery Node is ... (in its fullest context).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With