Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deploy strategy for Celery in ECS

The way we currently deploy Celery in ECS is by calling update-service on every code change. This works fine as far as swapping out the old code for the new.

The problematic scenario is when we have long-running Celery tasks, and the deploy causes those to get killed. This is because ECS only gives a container 30 seconds to shutdown (you can increase that to 10 minutes, but even that isn't long enough in some cases). The killed Celery tasks do get successfully restarted by the new Celery worker(s), but you can imagine if you deploy once an hour, and your task takes 1.5 hours to finish, it will never complete.

Ideally the deploy would tell the existing Celery worker(s) to stop gracefully, i.e. finish running tasks but don't start any new ones. Then it would start new worker containers with the new code, so you have old and new running at the same time. Then, when the long-running tasks have finished, the containers with the old code would be removed.

This seems like a problem that must have been encountered by others but I can't find anything describing this. Scripting this probably wouldn't be too bad but it feels like we'd be working around ECS to do it. Any pointers or ideas to help figure this out would be great. Thanks!

like image 274
Brad Avatar asked Oct 30 '25 08:10

Brad


1 Answers

Well, the way we wound up doing this, which is working quite well, is by calling shutdown on the worker explicitly during our deploy process, rather than doing the deploy via ECS.

class Command(BaseCommand):
    help = "Shutdown sunflower worker"

    def handle(self, *args: Any, **options: Any) -> Optional[str]:
        logger.info("Shutting down sunflower worker")
        app.default_app.control.shutdown()

This solves the timeout issue, and allows any running tasks to keep running. When each task finishes the worker will stop (and any workers not running tasks will stop immediately). We set up our task definition in ECS to always use the latest container, and thus when it restarts it is equivalent to doing a deploy.

You lose out on some of the niceties of an ECS deploy but it's close enough, and definitely better than killing long-running tasks.

like image 141
Brad Avatar answered Nov 02 '25 23:11

Brad



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!