Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Celery design help: how to prevent concurrently executing tasks

I'm fairly new to Celery/AMQP and am trying to come up with a task/queue/worker design to meet the following requirements.

I have multiple types of "per-user" tasks: e.g., TaskA, TaskB, TaskC. Each of these "per-user" tasks read/write data for one particular user in the system. So at any given time, I might need to create tasks User1_TaskA, User1_TaskB, User1_TaskC, User2_TaskA, User2_TaskB, etc. I need to ensure that, for each user, no two tasks of any task type execute concurrently. I want a system in which no worker can execute User1_TaskA at the same time as any other worker is executing User1_TaskB or User1_TaskC, but while User1_TaskA is executing, other workers shouldn't be blocked from concurrently executing User2_TaskA, User3_TaskA, etc.

I realize this could be implemented using some sort of external locking mechanism (e.g., in the DB), but I'm hoping there's a more elegant task/queue/worker design that would work.

I suppose one possible solution is to implement queues as user buckets such that, when the workers are launched there's config that specifies how many buckets to create, and each "bucket worker" is bound to exactly one bucket. Then an "intermediate worker" would pull off tasks from the main task queue and assign them into the bucketed queues via, say, a hash/mod scheme. So UserA's tasks would always end up in the same queue, and multiple tasks for UserA would back up behind each other. I don't love this approach, as it would require the number of buckets to be defined ahead of time, and would seem to prevent (easily) adding workers dynamically. Seems to me there's got to be a better way -- suggestions would be greatly appreciated.

like image 339
Matt Avatar asked Mar 21 '12 19:03

Matt


People also ask

How do you stop the execution of a celery task?

To cancel an already executing task with Celery and Python, we can use the revoke function. to call revoke with the task_id of the task to stop. And we set terminate to True to terminate the task.

How many tasks can celery handle?

celery beats only trigger those 1000 tasks (by the crontab schedule), not run them.

Is celery multithreading or multiprocessing?

multiprocessing gives a threading-like interface to managing multiple processes on a single machine. celery is an asynchronous task queue that can manage tasks over multiple machines in a cluster, for example.

What does delay do in celery?

For the delayed execution calls, you will see a shared_task instance instead of the result of the executed function. The primary reason for this is to allow the task to be run later, canceled later, or interrupted later. Later is the key phrase because the delayed execution task will now be run by the celery worker.


1 Answers

What's so bad in using an external locking mechanism? It's simple, straightforward, and efficient enough. You can find an example of distributed task locking in Celery here. Extend it by creating a lock per user, and you're done!

like image 172
hymloth Avatar answered Oct 06 '22 16:10

hymloth