Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Celery difference between concurrency, workers and autoscaling

In my /etc/defaults/celeryd config file, I've set:

CELERYD_NODES="agent1 agent2 agent3 agent4 agent5 agent6 agent7 agent8" CELERYD_OPTS="--autoscale=10,3 --concurrency=5" 

I understand that the daemon spawns 8 celery workers, but I'm fully not sure what autoscale and concurrency do together. I thought that concurrency was a way to specify the max number of threads that a worker can use and autoscale was a way for the worker to scale up and down child workers, if necessary.

The tasks have a largish payload (some 20-50kB) and there are like 2-3 million such tasks, but each task runs in less than a second. I'm seeing memory usage spike up because the broker distributes the tasks to every worker, thus replicating the payload multiple times.

I think the issue is in the config and that the combination of workers + concurrency + autoscaling is excessive and I would like to get a better understanding of what these three options do.

like image 283
Joseph Avatar asked Aug 08 '15 20:08

Joseph


People also ask

What is concurrency in celery?

As for --concurrency celery by default uses multiprocessing to perform concurrent execution of tasks. The number of worker processes/threads can be changed using the --concurrency argument and defaults to the number of available CPU's if not set.

What are workers in celery?

"Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well. The execution units, called tasks, are executed concurrently on a single or more worker servers using multiprocessing, Eventlet, or gevent.

What is celery autoscale?

Asynchronous task queues are tools to allow pieces of a software program to run in a separate machine/process. Celery is a task queuing app. Celery communicates via messages, usually using a broker (e.g. RabbitMQ) to mediate between clients and workers.

Is celery a multiprocess?

Celery itself is using billiard (a multiprocessing fork) to run your tasks in separate processes.


1 Answers

Let's distinguish between workers and worker processes. You spawn a celery worker, this then spawns a number of processes (depending on things like --concurrency and --autoscale, the default is to spawn as many processes as cores on the machine). There is no point in running more than one worker on a particular machine unless you want to do routing.

I would suggest running only 1 worker per machine with the default number of processes. This will reduce memory usage by eliminating the duplication of data between workers.

If you still have memory issues then save the data to a store and pass only an id to the workers.

like image 112
scytale Avatar answered Oct 05 '22 22:10

scytale