After 2 days of debug, I nailed down my time-hog: the Python garbage collector.
My application holds a lot of objects in memory. And it works well.
The GC does the usual rounds (I have not played with the default thresholds of (700, 10, 10)).
Once in a while, in the middle of an important transaction, the 2nd generation sweep kicks in and reviews my ~1.5M generation 2 objects.
This takes 2 seconds!
The nominal transaction takes less than 0.1 seconds.
My question is what should I do?
I can turn off generation 2 sweeps (by setting a very high threshold - is this the right way?) and the GC is obedient.
When should I turn them on?
We implemented a web service using Django, and each user request takes about 0.1 seconds.
Optimally, I will run these GC gen 2 cycles between user API requests. But how do I do that?
My view ends with return HttpResponse()
, AFTER which I would like to run a gen 2 GC sweep.
How do I do that? Does this approach even make sense?
Can I mark the object that NEVER need to be garbage collected so the GC will not test them every 2nd gen cycle?
How can I configure the GC to run full sweeps when the Django server is relatively idle?
Python 2.6.6 on multiple platforms (Windows / Linux).
We did something like this for gunicorn. Depending on what wsgi server you use, you need to find the right hooks for AFTER the response, not before. Django has a request_finished
signal but that signal is still pre response.
For gunicorn, in the config you need to define 2 methods like so:
def pre_request(worker, req):
# disable gc until end of request
gc.disable()
def post_request(worker, req, environ, resp):
# enable gc after a request
gc.enable()
The post_request
here runs after the http response has been delivered, and so is a very good time for garbage collection.
I believe one option would be to completely disable garbage collection and then manually collect at the end of a request as suggested here: How does the Garbage Collection mechanism work?
I imagine that you could disable the GC in your settings.py
file.
If you want to run GarbageCollection on every request I would suggest developing some Middleware that does it in the process response method:
import gc
class GCMiddleware(object):
def process_response(self, request, response):
gc.collect()
return response
An alternative might be to disable GC altogether, and configure mod_wsgi (or whatever you're using) to kill and restart processes more frequently.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With