Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django Python Garbage Collection woes

After 2 days of debug, I nailed down my time-hog: the Python garbage collector.
My application holds a lot of objects in memory. And it works well.
The GC does the usual rounds (I have not played with the default thresholds of (700, 10, 10)).
Once in a while, in the middle of an important transaction, the 2nd generation sweep kicks in and reviews my ~1.5M generation 2 objects.
This takes 2 seconds! The nominal transaction takes less than 0.1 seconds.

My question is what should I do?
I can turn off generation 2 sweeps (by setting a very high threshold - is this the right way?) and the GC is obedient.
When should I turn them on?
We implemented a web service using Django, and each user request takes about 0.1 seconds.
Optimally, I will run these GC gen 2 cycles between user API requests. But how do I do that?
My view ends with return HttpResponse(), AFTER which I would like to run a gen 2 GC sweep.
How do I do that? Does this approach even make sense?

Can I mark the object that NEVER need to be garbage collected so the GC will not test them every 2nd gen cycle?
How can I configure the GC to run full sweeps when the Django server is relatively idle?

Python 2.6.6 on multiple platforms (Windows / Linux).

like image 822
Tal Weiss Avatar asked Jan 04 '11 14:01

Tal Weiss


3 Answers

We did something like this for gunicorn. Depending on what wsgi server you use, you need to find the right hooks for AFTER the response, not before. Django has a request_finished signal but that signal is still pre response.

For gunicorn, in the config you need to define 2 methods like so:

def pre_request(worker, req):
    # disable gc until end of request
    gc.disable()


def post_request(worker, req, environ, resp):
    # enable gc after a request
    gc.enable()

The post_request here runs after the http response has been delivered, and so is a very good time for garbage collection.

like image 64
dalore Avatar answered Oct 05 '22 07:10

dalore


I believe one option would be to completely disable garbage collection and then manually collect at the end of a request as suggested here: How does the Garbage Collection mechanism work?

I imagine that you could disable the GC in your settings.py file.

If you want to run GarbageCollection on every request I would suggest developing some Middleware that does it in the process response method:

import gc
class GCMiddleware(object):
    def process_response(self, request, response):
        gc.collect()
        return response
like image 42
milkypostman Avatar answered Oct 05 '22 05:10

milkypostman


An alternative might be to disable GC altogether, and configure mod_wsgi (or whatever you're using) to kill and restart processes more frequently.

like image 32
Daniel Roseman Avatar answered Oct 05 '22 06:10

Daniel Roseman