I'm running a Flask app on Heroku using gunicorn with eventlet workers. A particular route on my app frequently receives POST data (x-www-form-urlencoded) with some fairly chunky fields -- on the order of 500KB at most.
This works fine when running locally, but on Heroku, requests to that route takes anywhere from 5 to 30 seconds to complete -- and almost 100% of the time is spent in the first access to request.form:
t = time.time()
action = str(request.form['action'])
dt = time.time() - t # Often 10 seconds or more!
This is confirmed by Newrelic slow request tracing as well. There's a few milliseconds here or there for the database operations, and then a huge chunk of time in Python code, apparently spent waiting on some i/o, since the reported CPU time is usually less than a millisecond.
I have been totally unable to reproduce this in a local environment using the same gunicorn/eventlet setup I'm using in production. Even the built-in debug WSGI server is lightning-fast on these requests.
Does anybody have any idea what might be going wrong? Is it a problem with Flask, or something I just need to contact Heroku support about?
I think I figured out exactly what was going on. TL;DR it wasn't actually slow on the server end at all, I was just misled by Newrelic's reported response times!
I tried running the same code on dotCloud's sandbox as a @AllanAnderson suggested. I first created a reduced test case: a simple HTML form with a few hidden fields pre-loaded with around 900KB of data, and a view function that does nothing except read from the request.form dictionary and measure the elapsed time for each access using time.time().
On Heroku, the results looked like this:
5.87100 seconds: read field "p1": 786432 bytes
0.00019 seconds: read field "p2": 131072 bytes
0.00003 seconds: read field "p3": 12288 bytes
0.00001 seconds: read field "p4": 1024 bytes
And on dotCloud:
0.00096 seconds: read field "p1": 786432 bytes
0.00019 seconds: read field "p2": 131072 bytes
0.00003 seconds: read field "p3": 12288 bytes
0.00001 seconds: read field "p4": 1024 bytes
However, both tests seemed to take the same amount of time in my browser... and by now you've probably guessed the real answer to this "problem." :-)
It turns out that gunicorn on Heroku was executing the view function as soon as the headers were received, and the first access to request.form blocked until the rest of the request was received. So Newrelic saw all these ridiculously slow response times that were really just the result of uploading POST data over a crappy network connection. dotCloud's setup apparently just waits until the whole request has been received.
This makes Newrelic's metrics less useful, but it's not actually a problem with end user experience.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With