Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow access to Django's request.body

Sometimes this line of Django app (hosted using Apache/mod_wsgi) takes a lot of time to execute (eg. 99% of eg. 6 seconds of request handling, as measured by New Relic), when submitted by some mobile clients:

raw_body = request.body

(where request is an incoming request)

The questions I have:

  1. What could have slowed down access to request.body so much?
  2. What would be the correct configuration for Apache to wait before invoking Django until client sends whole payload? Maybe the problem is in Apache configuration.

Django's body attribute in HttpRequest is a property, so that really resolves on what is really being done there and how to make it happen outside of the Django app, if possible. I want Apache to wait for full request before sending it to Django app.

like image 429
Tadeck Avatar asked May 31 '14 11:05

Tadeck


2 Answers

Regarding (1), Apache passes control to the mod_wsgi handler as soon as the request's headers are available, and mod_wsgi then passes control on to Python. The internal implementation of request.body then calls the read() method which eventually calls the implementation within mod_wsgi, which requests the request's body from Apache and, if it hasn't been completely received by Apache yet, blocks until it is available.

Regarding (2), this is not possible with mod_wsgi alone. At least, the hook processing incoming requests doesn't provide a mechanism to block until the full request is available. Another poster suggested to use nginx as a proxy in a response to this duplicate question.

like image 193
Phillip Avatar answered Oct 16 '22 12:10

Phillip


There are two ways you can fix this in Apache.

You can use mod_buffer, available in >=2.3, and change BufferSize to the maximum expected payload size. This should make Apache hold the request in memory until it's either finished sending, or the buffer is reached.

For older Apache versions < 2.3, you can use mod_proxy combined with ProxyIOBufferSize, ProxyReceiveBufferSize and a loopback vhost. This involves putting your real vhost on a loopback interface, and exposing a proxy vhost which connects back to the real vhost. The downside to this is that it uses twice as many sockets, and can make resource calculation difficult.

However, the most ideal choice would be to enable request/response buffering at your L4/L7 load balancer. For example, haproxy lets you add rules based on req_len and same goes for nginx. Most good commercial load balancers also have an option to buffer requests before sending.

All three approaches rely on buffering the full request/response payload, and there are performance considerations depending on your use case and available resources. You could cache the entire payload in memory but this may dramatically decrease your maximum concurrent connections. You could choose to write the payload to local storage (preferably SSD), but you are then limited by IO capacity.

You also need to consider file uploads, because these are not a good fit for memory based payload buffering. In most cases, you would handle upload requests in your webserver, for example HttpUploadModule, then query nginx for the upload progress, rather than handling it directly in WSGI. If you are buffering at your load balancer, then you may wish to exclude file uploads from the buffering rules.

You need to understand why this is happening, and that this problem exists both when sending a response and receiving a request. It's also a good idea to have these protections in place, not just for scalability, but for security reasons.

like image 23
sleepycal Avatar answered Oct 16 '22 13:10

sleepycal