Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Capture Heroku SIGTERM in Celery workers to shutdown worker gracefully

I've done a ton of research on this, and I'm surprised I haven't found a good answer to this yet anywhere.

I'm running a large application on Heroku, and I have certain celery tasks that run for a very long time processing, and at the end of the task save a result. Every time I redeploy on Heroku, it sends SIGTERM (and eventually, SIGKILL) and kills my running worker. I'm trying to find a way for the worker instance to shut itself down gracefully and re-queue itself for processing later so that eventually we can save the required result instead of losing the queued task.

I cannot find a way that works to have the worker listen for SIGTERM properly. The closest I've gotten, which works when running python manage.py celeryd directly but NOT when emulating Heroku using foreman, is the following:

@app.task(bind=True, max_retries=1)
def slow(self, x):
    try:
        for x in range(100):
            print 'x: ' + unicode(x)
            time.sleep(10)
    except exceptions.MaxRetriesExceededError:
        logger.error('whoa')
    except (exceptions.WorkerShutdown, exceptions.WorkerTerminate) as exc:
        logger.error(u'retrying, ' + unicode(exc))
        raise self.retry(exc=exc, countdown=10)
    except (KeyboardInterrupt, SystemExit) as exc:
        print 'retrying'
        raise self.retry(exc=exc, countdown=10)
    else:
        return x
    finally:
        logger.info('task ended!')

When I start this celery task running within foreman and hit Ctrl+C, the following happens:

^CSIGINT received
22:20:59 system   | sending SIGTERM to all processes
22:20:59 web.1    | exited with code 0
22:21:04 system   | sending SIGKILL to all processes
Killed: 9

So it's clear that none of the celery exceptions, nor the KeyboardInterrupt or SystemExit exceptions I've seen in other posts, properly catch SIGTERM and shut down the worker.

What is the right way to do this?

like image 867
jdotjdot Avatar asked Apr 26 '15 02:04

jdotjdot


2 Answers

Starting in version >= 4, Celery comes with a special feature, just for Heroku, that supports this functionality out of the box:

$ REMAP_SIGTERM=SIGQUIT celery -A proj worker -l info

source: https://devcenter.heroku.com/articles/celery-heroku#using-remap_sigterm

like image 186
xavriley Avatar answered Oct 22 '22 02:10

xavriley


celery was unfortunately not designed to do clean shutdown. EVER. I mean it. celery workers respond to SIGTERM but if a task is incomplete, the worker processes will wait to finish the task and only then exit. In which case, you can send it SIGKILL if the workers don't shut down in a reasonable time but there will be a loss of information in this case i.e. you may not know which jobs remained incomplete.

like image 1
pbhowmick Avatar answered Oct 22 '22 02:10

pbhowmick