tl;dr A method decorated with route
can't handle concurrent requests while Flask is served behind a gunicorn started with multiple workers and threads, while two different methods handle concurrent requests fine. Why is this the case, and how can the same route be served concurrently?
I have this simple flask app:
from flask import Flask, jsonify
import time
app = Flask(__name__)
@app.route('/foo')
def foo():
time.sleep(5)
return jsonify({'success': True}), 200
@app.route('/bar')
def bar():
time.sleep(5)
return jsonify({'success': False}), 200
If I run this via:
gunicorn test:app -w 1 --threads 1
If I quickly open up /bar
and /foo
in two different tabs in a browser, whichever tab I hit enter on first will load in 5 seconds, and the second tab will load in 10 seconds. This makes sense because gunicorn is running one worker with one thread.
If I run this via either:
gunicorn test:app -w 1 --threads 2
gunicorn test:app -w 2 --threads 1
In this case, opening up /foo
and /bar
in two different tabs both take 5 seconds. This makes sense, because gunicorn is running either 1 worker with two threads, or two workers with one thread each, and can serve up the two routes at the same time.
However, If I open up two /foo
at the same time, regardless of the gunicorn configuration, the second tab will always take 10 seconds.
How can I get the same method decorated by route
to serve concurrent requests?
Yes, with 5 worker processes, each with 8 threads, 40 concurrent requests can be served.
Improve performance in both Blocking and Non-Blocking web servers. Multitasking is the ability to execute multiple tasks or processes (almost) at the same time. Modern web servers like Flask, Django, and Tornado are all able to handle multiple requests simultaneously.
For reference, the Flask benchmarks on techempower give 25,000 requests per second.
This problem is probably not caused by Gunicorn or Flask but by the browser.
I just tried to reproduce it. With two Firefox tabs it works; but if I run two curl
processes in different consoles then they get served as expected (in parallel), and their requests are handled by different workers - this can be checked by enabling --log-level DEBUG
while starting gunicorn.
I think this is because Firefox (and maybe other browsers) open a single connection to the server for each URL; and when you open one page on two tabs, their requests are sent through the same (kept-alive) connection and as a result come to the same worker.
As a result, even using async worker like eventlet
will not help: async worker may handle multiple connections at a time, but when two requests land on the same connection then they will necessarily be handled one-by-one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With