Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Large celery task memory leak

I have a huge celery task that works basically like this:

 @task
 def my_task(id):
   if settings.DEBUG:
     print "Don't run this with debug on."
     return False

   related_ids = get_related_ids(id)

   chunk_size = 500

   for i in xrange(0, len(related_ids), chunk_size):
     ids = related_ids[i:i+chunk_size]
     MyModel.objects.filter(pk__in=ids).delete()
     print_memory_usage()

I also have a manage.py command that just runs my_task(int(args[0])), so this can either be queued or run on the command line.

When run on the command line, print_memory_usage() reveals a relatively constant amount of memory used.

When run inside celery, print_memory_usage() reveals an ever-increasing amount of memory, continuing until the process is killed (I'm using Heroku with a 1GB memory limit, but other hosts would have a similar problem.) The memory leak appears to correspond with the chunk_size; if I increase the chunk_size, the memory consumption increases per-print. This seems to suggest that either celery is logging queries itself, or something else in my stack is.

Does celery log queries somewhere else?

Other notes:

  • DEBUG is off.
  • This happens both with RabbitMQ and Amazon's SQS as the queue.
  • This happens both locally and on Heroku (though it doesn't get killed locally due to having 16 GB of RAM.)
  • The task actually goes on to do more things than just deleting objects. Later it creates new objects via MyModel.objects.get_or_create(). This also exhibits the same behavior (memory grows under celery, doesn't grow under manage.py).
like image 770
GDorn Avatar asked Sep 20 '13 22:09

GDorn


1 Answers

This turned out not to have anything to do with celery. Instead, it was new relic's logger that consumed all of that memory. Despite DEBUG being set to False, it was storing every SQL statement in memory in preparation for sending it to their logging server. I do not know if it still behaves this way, but it wouldn't flush that memory until the task fully completed.

The workaround was to use subtasks for each chunk of ids, to do the delete on a finite number of items.

The reason this wasn't a problem when running this as a management command is that new relic's logger wasn't integrated into the command framework.

Other solutions presented attempted to reduce the overhead for the chunking operation, which doesn't help in an O(N) scaling concern, or force the celery tasks to fail if a memory limit is exceeded (a feature that didn't exist at the time, but might have eventually worked with infinite retries.)

like image 121
GDorn Avatar answered Sep 22 '22 04:09

GDorn