Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Celery discover new Nodes?

I'm running Celery and RabbitMQ Gunicorn in Docker.

My question is this: I understand that Celery is designed for distributed processing. What I have see no docs on at all is, assuming that I have several machines/nodes on the same LAN, how do they discover each other? Does RabbitMQ play a role? Do celery instances somehow discover each other? Is there a list of suitable hosts somewhere? If so, how do I edit it?

Also, assuming I'm going to use only one node to handle the HTTP requests, do I still need to have gunicorn running on all nodes? I ask this because in the gunicorn start command, it has a setting for the number of workers. And, is this setting applicable only to that node, or as a max total for all connected nodes?

EDIT: After the first answer, I started working on this. It seems that I need some sort of networking setup, either swarm or bridging etc. I should clarify that I'm using docker-compose to bring up the solution, and I see that a normal swarm setup doesn't work, and I have to use something slightly different if I go that route.

To be clear: I need a way in which I can add celery workers on separate hosts and have them be able to communicate with the "main" host so that I can increase the capacity of the system. If someone could provide a clear process for achieving this or a link to such, it'd be most helpful.

I hope I've expressed this clearly, please let me know if you need any further info.

Thanks!

like image 558
Bruce Avatar asked Feb 03 '17 19:02

Bruce


People also ask

How does celery define workers?

Celery worker is a simple, flexible, and reliable distributed system to process vast amounts of messages while providing operations with the tools required to maintain such a system. In this tutorial, we will learn how to implement Celery with Flask and Redis.

What are celery nodes?

A Node is just a Worker in a Cluster. In short Node = Worker. A Cluster is a number of Workers running in parallel (using celery multi as per the document I introduced with). A Cluster is just a convenient way of starting and stopping and managing multiple workers on the same machine.


2 Answers

I feel like @ffledgling didn't fully answer the question so I am adding a note:

Here is a list of all events sent by the worker to the broker (in your case RabbitMq): http://docs.celeryproject.org/en/latest/userguide/monitoring.html#event-reference

As you can see, there are few worker self-related messages/events:

  • worker-online
  • worker-heartbeat
  • worker-offline

All of them contain a signature of the hostname. Therefore a successful handshake flow (not exactly handshake because master doesn't respond with message but using it as a metaphor here) may look like this:

> new worker online --> worker send worker-online message to the queue --> master received and start to read logs from worker host --> master schedule tasks --> ...

Beyond that, host name is a standard body field in every event (both task and worker self-related), here is the documentation: http://docs.celeryproject.org/en/latest/internals/protocol.html?highlight=event%20reference#standard-body-fields

For example, if you look at task-started event: it also contains a hostname as signature, this is how the master knows who picked up the task and where to read the log of the task from.

like image 200
P. Xie Avatar answered Sep 29 '22 07:09

P. Xie


I understand that Celery is designed for distributed processing. What I have see no docs on at all is, assuming that I have several machines/nodes on the same LAN, how do they discover each other? Does RabbitMQ play a role? Do celery instances somehow discover each other? Is there a list of suitable hosts somewhere? If so, how do I edit it?

Celery is a distributed task queue that works using a message brokering system such as RabbitMQ.

What essentially happens all celery workers connect a shared Queue such as RabbitMQ. The master(s) dispatch work by pushing it onto the queue. Workers who are connected to the Queue as well, pull work off of the queue and then attempt to execute it. Once it is finished (successfully or otherwise), it will push the results back onto the Queue, which the master(s) can then query.

Given this architecture, you do not need to add a list of hosts, they "auto-detect" work. You simply need to start them up and ensure they can talk to the Queue.

A slightly more detailed explanation from another SO answer.

Link to the architecture with a diagram.

Also, assuming I'm going to use only one node to handle the HTTP requests, do I still need to have gunicorn running on all nodes? I ask this because in the gunicorn start command, it has a setting for the number of workers. And, is this setting applicable only to that node, or as a max total for all connected nodes?

No, you do not need guicorn running on all the nodes, just the one you're using to serve HTTP requests via python. Celery workers do not need guicorn. The worker setting in guicorn refers to the number of workers in the HTTP listeners pool. This is separate, independent and unrelted to the set of workers that celery uses.

like image 31
ffledgling Avatar answered Sep 29 '22 08:09

ffledgling