Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple instances of celerybeat for autoscaled django app on elasticbeanstalk

I am trying to figure out the best way to structure a Django app that uses Celery to handle async and scheduled tasks in an autoscaling AWS ElasticBeanstalk environment.

So far I have used only a single instance Elastic Beanstalk environment with Celery + Celerybeat and this worked perfectly fine. However, I want to have multiple instances running in my environment, because every now and then an instance crashes and it takes a lot of time until the instance is back up, but I can't scale my current architecture to more than one instance because Celerybeat is supposed to be running only once across all instances as otherwise every task scheduled by Celerybeat will be submitted multiple times (once for every EC2 instance in the environment).

I have read about multiple solutions, but all of them seem to have issues that don't make it work for me:

  • Using django cache + locking: This approach is more like a quick fix than a real solution. This can't be the solution if you have a lot of scheduled tasks and you need to add code to check the cache for every task. Also tasks are still submitted multiple times, this approach only makes sure that execution of the duplicates stops.
  • Using leader_only option with ebextensions: Works fine initially, but if an EC2 instance in the enviroment crashes or is replaced, this would lead to a situation where no Celerybeat is running at all, because the leader is only defined once at the creation of the environment.
  • Creating a new Django app just for async tasks in the Elastic Beanstalk worker tier: Nice, because web servers and workers can be scaled independently and the web server performance is not affected by huge async work loads performed by the workers. However, this approach does not work with Celery because the worker tier SQS daemon removes messages and posts the message bodies to a predefined urls. Additionally, I don't like the idea of having a complete additional Django app that needs to import the models from the main app and needs to be separately updated and deployed if the tasks are modified in the main app.

How to I use Celery with scheduled tasks in a distributed Elastic Beanstalk environment without task duplication? E.g. how can I make sure that exactly one instance is running across all instances all the time in the Elastic Beanstalk environment (even if the current instance with Celerybeat crashes)?

Are there any other ways to achieve this? What's the best way to use Elastic Beanstalk's Worker Tier Environment with Django?

like image 795
wuser92 Avatar asked Oct 19 '16 00:10

wuser92


1 Answers

I guess you could single out celery beat to different group.

Your auto scaling group runs multiple django instances, but celery is not included in the ec2 config of the scaling group.

You should have different set (or just one) of instance for celery beat

like image 200
eugene Avatar answered Oct 27 '22 03:10

eugene