Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a Celery worker node exactly?

Tags:

celery

Is it the actual processor in which Celery is running or is it another process? In flower, I could see multiple processes in a worker pool? What are the differences between these two?

like image 591
aliasav Avatar asked Feb 11 '15 08:02

aliasav


2 Answers

When you run a celery worker, it creates one parent process to manage the running tasks. This process handles the book keeping features like sending/receiving queue messages, registering tasks, killing hung tasks, tracking status, etc.

That process then spawns N number of child worker processes that actually execute the individual tasks. The number is determined by the -c argument when starting the worker: http://docs.celeryproject.org/en/latest/userguide/workers.html#concurrency

The child processes can be implemented using a number of strategies and is configured when starting the worker with the -P argument. Possible values include: prefork, eventlet, gevent, threads/solo.

like image 191
Chris Ward Avatar answered Oct 04 '22 19:10

Chris Ward


It turns out that Celery nodes are indirectly documented here:

https://docs.celeryproject.org/en/latest/reference/celery.bin.multi.html#celery.bin.multi.MultiTool.MultiParser.Node

In short Celery uses a set of terms that are useful to understand when building a system of distributed work.

  • Client - An application that wants to see work done
  • Worker - An application that gets the work done

Terms around those that help to plan things include:

  • Broker - The means by which Client asks Worker to do work.
  • Application - An instance of the Celery class

At this point, take note that the Client, Broker and Worker can all be on different machines, and there can in fact be multiple Clients on different machines and multiple Workers on different machines, as long as they use the same Broker.

It should be no surprise then, that the application typically has the Broker configured with a URL. That is all the Applications, in all the Clients and Workers are all using the same Broker URL and hence are all using the same Broker.

The Clients send (produce) messages via the broker, requesting tasks to run, the Workers read (consume) those messages.

Now these terms all have a place:

  • Execution Pool
  • Cluster
  • Node

Each Worker can process multiple task at once, by maintaining an execution pool. This pool might be threads, or (by default) it is subprocesses. So a Worker may have a number of Pool processes as children.

One of the frustrations (I have) with Celery is that you can communicate liberally with Workers but not with the running tasks in a Worker's Execution Pool (for which reason I am creating a new Task class for interactive tasks, but it's still evolving).

A Node is just a Worker in a Cluster. In short Node = Worker. A Cluster is a number of Workers running in parallel (using celery multi as per the document I introduced with). A Cluster is just a convenient way of starting and stopping and managing multiple workers on the same machine.

There may be many Clusters all consuming tasks from the same Broker though and they may be on the same machine (though one would wonder why) or on different machines.

And that is what a Celery Node is ... (in its fullest context).

like image 43
Bernd Wechner Avatar answered Oct 04 '22 21:10

Bernd Wechner