I wrote an fastapi app. And now I am thinking about deploying it however I seem to get strange unexpected performance issues that seem to depend on wether I use uvicorn vs gunicorn. In particular all code (even standard library pure python code) seems to get slower if I use gunicorn. For performance debugging I wrote a small app that demonstrates this: <pre class="prettyprint"><code>import asyncio, time from fastapi import FastAPI, Path from datetime import datetime app = FastAPI() @app.get("/delay/{delay1}/{delay2}") async def get_delay( delay1: float = Path(..., title="Nonblocking time taken to respond"), delay2: float = Path(..., title="Blocking time taken to respond"), ): total_start_time = datetime.now() times = [] for i in range(100): start_time = datetime.now() await asyncio.sleep(delay1) time.sleep(delay2) times.append(str(datetime.now()-start_time)) return {"delays":[delay1,delay2],"total_time_taken":str(datetime.now()-total_start_time),"times":times} </code></pre> Running the fastapi appi with: <pre class="prettyprint"><code>gunicorn api.performance_test:app -b localhost:8001 -k uvicorn.workers.UvicornWorker --workers 1 </code></pre> The resonse body of a get to <code>http://localhost:8001/delay/0.0/0.0</code> is consistently something like: <pre class="prettyprint"><code>{ "delays": [ 0.0, 0.0 ], "total_time_taken": "0:00:00.057946", "times": [ "0:00:00.000323", ...smilar values omitted for brevity... "0:00:00.000274" ] } </code></pre> However using: <pre class="prettyprint"><code>uvicorn api.performance_test:app --port 8001 </code></pre> I consitently get timings like these <pre class="prettyprint"><code>{ "delays": [ 0.0, 0.0 ], "total_time_taken": "0:00:00.002630", "times": [ "0:00:00.000037", ...snip... "0:00:00.000020" ] } </code></pre> The difference becomes even more prounced when I uncomment the <code>await asyncio.sleep(delay1)</code> statement. So I am wondering what gunicorn/uvicorn do to the python/fastapi runtime to create this factor 10 difference in the speed of code execution. For what is is worth I performed these tests using Python 3.8.2 on OS X 11.2.3 with an intel I7 processor. And these are the relevant parts of my <code>pip freeze</code> output <pre class="prettyprint"><code>fastapi==0.65.1 gunicorn==20.1.0 uvicorn==0.13.4 </code></pre>

The difference is due to the underlying web server you use. An analogy can be: <code>two cars, same brand, same options, just a different engine, what's the difference?</code> Web servers are not exactly like a car, but I guess you get the point I'm trying to make. Basically, <code>gunicorn</code> is a <code>synchronous</code> web server, while <code>uvicorn</code> is an <code>asynchronous</code> web server. Since you're using <code>fastapi</code> and <code>await</code> keywords I guess that you already know what <code>asyncio</code>/<code>asynchornous programming</code> is. I don't know the code differences, so take my answer with a grain of salt, but <code>uvicorn</code> is more performant because of the <code>asynchronous</code> part. My guess for the timing difference, is that if you use an <code>async</code> web server, it is already configured on startup for handling <code>async</code> functions, while if you use a <code>sync</code> web server, it isn't and there is some kind of overhead in order to abstract that part. It's not a proper answer, but it gives you a hint on where the difference could lie.

Fastapi python code execution speed impacted by deployment with uvicorn vs gunicorn

Tags:

python

gunicorn

fastapi

uvicorn

I wrote an fastapi app. And now I am thinking about deploying it however I seem to get strange unexpected performance issues that seem to depend on wether I use uvicorn vs gunicorn. In particular all code (even standard library pure python code) seems to get slower if I use gunicorn. For performance debugging I wrote a small app that demonstrates this:

import asyncio, time
from fastapi import FastAPI, Path
from datetime import datetime

app = FastAPI()

@app.get("/delay/{delay1}/{delay2}")
async def get_delay(
    delay1: float = Path(..., title="Nonblocking time taken to respond"),
    delay2: float = Path(..., title="Blocking time taken to respond"),
):
    total_start_time = datetime.now()
    times = []
    for i in range(100):
        start_time = datetime.now()
        await asyncio.sleep(delay1)
        time.sleep(delay2)
        times.append(str(datetime.now()-start_time))
    return {"delays":[delay1,delay2],"total_time_taken":str(datetime.now()-total_start_time),"times":times}

Running the fastapi appi with:

gunicorn api.performance_test:app -b localhost:8001 -k uvicorn.workers.UvicornWorker --workers 1

The resonse body of a get to http://localhost:8001/delay/0.0/0.0 is consistently something like:

{
  "delays": [
    0.0,
    0.0
  ],
  "total_time_taken": "0:00:00.057946",
  "times": [
    "0:00:00.000323",
    ...smilar values omitted for brevity...
    "0:00:00.000274"
  ]
}

However using:

uvicorn api.performance_test:app --port 8001

I consitently get timings like these

{
  "delays": [
    0.0,
    0.0
  ],
  "total_time_taken": "0:00:00.002630",
  "times": [
    "0:00:00.000037",
    ...snip...
    "0:00:00.000020"
  ]
}

The difference becomes even more prounced when I uncomment the await asyncio.sleep(delay1) statement.

So I am wondering what gunicorn/uvicorn do to the python/fastapi runtime to create this factor 10 difference in the speed of code execution.

For what is is worth I performed these tests using Python 3.8.2 on OS X 11.2.3 with an intel I7 processor.

And these are the relevant parts of my pip freeze output

fastapi==0.65.1
gunicorn==20.1.0
uvicorn==0.13.4

757

asked May 29 '21 10:05

M.D.

2 Answers

I can't reproduce your results.

My environment: ubuntu on WSL2 on Windows 10

relevant parts of my pip freeze output:

fastapi==0.65.1
gunicorn==20.1.0
uvicorn==0.14.0

I modified code a little:

import asyncio, time
from fastapi import FastAPI, Path
from datetime import datetime
import statistics

app = FastAPI()

@app.get("/delay/{delay1}/{delay2}")
async def get_delay(
    delay1: float = Path(..., title="Nonblocking time taken to respond"),
    delay2: float = Path(..., title="Blocking time taken to respond"),
):
    total_start_time = datetime.now()
    times = []
    for i in range(100):
        start_time = datetime.now()
        await asyncio.sleep(delay1)
        time.sleep(delay2)
        time_delta= (datetime.now()-start_time).microseconds
        times.append(time_delta)

    times_average = statistics.mean(times)

    return {"delays":[delay1,delay2],"total_time_taken":(datetime.now()-total_start_time).microseconds,"times_avarage":times_average,"times":times}

Apart from first loading of website, my results for both methods are nearly the same.

Times are between 0:00:00.000530 and 0:00:00.000620 most of the time for both methods.

First attempt for each takes longer: around 0:00:00.003000. However after I restarted Windows and tried those tests again I noticed I no longer have increased times on first requests after server startup (I think it is thanks to a lot of free RAM after restart)

Examples of not-first runs (3 attempts):

# `uvicorn performance_test:app --port 8083`

{"delays":[0.0,0.0],"total_time_taken":553,"times_avarage":4.4,"times":[15,7,5,4,4,4,4,5,5,4,4,5,4,4,5,4,4,5,4,4,5,4,4,5,4,4,4,5,4,4,5,4,4,5,4,4,4,4,4,5,4,5,5,4,4,4,4,4,4,5,4,4,4,5,4,4,4,4,4,4,5,4,4,5,4,4,4,4,5,4,4,5,4,4,4,4,4,5,4,4,5,4,4,5,4,4,5,4,4,4,4,4,4,4,5,4,4,4,5,4]}
{"delays":[0.0,0.0],"total_time_taken":575,"times_avarage":4.61,"times":[15,6,5,5,5,5,5,5,5,5,5,4,5,5,5,5,4,4,4,4,4,5,5,5,4,5,4,4,4,5,5,5,4,5,5,4,4,4,4,5,5,5,5,4,4,4,4,5,5,4,4,4,4,4,4,4,4,5,5,4,4,4,4,5,5,5,5,5,5,5,4,4,4,4,5,5,4,5,5,4,4,4,4,4,4,5,5,5,4,4,4,4,5,5,5,5,4,4,4,4]}
{"delays":[0.0,0.0],"total_time_taken":548,"times_avarage":4.31,"times":[14,6,5,4,4,4,4,4,4,4,5,4,4,4,4,4,4,5,4,4,5,4,4,4,4,4,4,4,5,4,4,4,5,4,4,4,4,4,4,4,4,5,4,4,4,4,4,4,5,4,4,4,4,4,5,5,4,4,4,4,4,4,4,5,4,4,4,4,4,5,4,4,5,4,4,5,4,4,5,4,4,4,4,4,4,4,5,4,4,5,4,4,5,4,4,5,4,4,4,4]}


# `gunicorn performance_test:app -b localhost:8084 -k uvicorn.workers.UvicornWorker --workers 1`

{"delays":[0.0,0.0],"total_time_taken":551,"times_avarage":4.34,"times":[13,6,5,5,5,5,5,4,4,4,5,4,4,4,4,4,5,4,4,5,4,4,5,4,4,4,4,4,5,4,4,4,4,4,5,4,4,4,4,4,4,4,5,4,4,5,4,4,4,4,4,4,4,4,5,4,4,4,4,4,4,4,5,4,4,4,4,4,4,4,4,4,5,4,4,5,4,5,4,4,5,4,4,4,4,5,4,4,5,4,4,4,4,4,4,4,5,4,4,5]}
{"delays":[0.0,0.0],"total_time_taken":558,"times_avarage":4.48,"times":[14,7,5,5,5,5,5,5,4,4,4,4,4,4,5,5,4,4,4,4,5,4,4,4,5,5,4,4,4,5,5,4,4,4,5,4,4,4,5,5,4,4,4,4,5,5,4,4,5,5,4,4,5,5,4,4,4,5,4,4,5,4,4,5,5,4,4,4,5,4,4,4,5,4,4,4,5,4,5,4,4,4,5,4,4,4,5,4,4,4,5,4,4,4,5,4,4,4,5,4]}
{"delays":[0.0,0.0],"total_time_taken":550,"times_avarage":4.34,"times":[15,6,5,4,4,4,4,4,4,5,4,4,4,4,4,5,4,4,5,4,4,5,4,4,4,4,4,5,4,4,4,4,5,5,4,4,4,4,5,4,4,4,4,4,5,4,4,5,4,4,5,4,4,5,4,4,5,4,4,5,4,4,4,4,4,4,5,4,4,5,4,4,4,4,4,4,4,4,4,5,4,4,5,4,4,4,4,4,4,4,4,5,4,4,5,4,4,4,4,4]}

Examples of not-first runs with commented await asyncio.sleep(delay1) (3 attempts):

# `uvicorn performance_test:app --port 8083`

{"delays":[0.0,0.0],"total_time_taken":159,"times_avarage":0.6,"times":[3,1,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,0,0,1,1,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,1,0,0,0,0,0,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,0]}
{"delays":[0.0,0.0],"total_time_taken":162,"times_avarage":0.49,"times":[3,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,0,1,0,0,0,0,1,1,1,1,1,0,0,0,0,1,1,1,1,0,0,1,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1]}
{"delays":[0.0,0.0],"total_time_taken":156,"times_avarage":0.61,"times":[3,1,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,1,0,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1]}


# `gunicorn performance_test:app -b localhost:8084 -k uvicorn.workers.UvicornWorker --workers 1`

{"delays":[0.0,0.0],"total_time_taken":159,"times_avarage":0.59,"times":[2,0,0,0,0,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,0,0,0,0,1,0,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,0,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,0,0,0,0,1,1,1,1,1,0,0]}
{"delays":[0.0,0.0],"total_time_taken":165,"times_avarage":0.62,"times":[3,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,1,0,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1]}
{"delays":[0.0,0.0],"total_time_taken":164,"times_avarage":0.54,"times":[2,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,1,1,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1]}

I made a Python script to benchmark those times more precisely:

import statistics
import requests
from time import sleep

number_of_tests=1000

sites_to_test=[
    {
        'name':'only uvicorn    ',
        'url':'http://127.0.0.1:8083/delay/0.0/0.0'
    },
    {
        'name':'gunicorn+uvicorn',
        'url':'http://127.0.0.1:8084/delay/0.0/0.0'
    }]


for test in sites_to_test:

    total_time_taken_list=[]
    times_avarage_list=[]

    requests.get(test['url']) # first request may be slower, so better to not measure it

    for a in range(number_of_tests):
        r = requests.get(test['url'])
        json= r.json()

        total_time_taken_list.append(json['total_time_taken'])
        times_avarage_list.append(json['times_avarage'])
        # sleep(1) # results are slightly different with sleep between requests

    total_time_taken_avarage=statistics.mean(total_time_taken_list)
    times_avarage_avarage=statistics.mean(times_avarage_list)

    print({'name':test['name'], 'number_of_tests':number_of_tests, 'total_time_taken_avarage':total_time_taken_avarage, 'times_avarage_avarage':times_avarage_avarage})

Results:

{'name': 'only uvicorn    ', 'number_of_tests': 2000, 'total_time_taken_avarage': 586.5985, 'times_avarage_avarage': 4.820865}
{'name': 'gunicorn+uvicorn', 'number_of_tests': 2000, 'total_time_taken_avarage': 571.8415, 'times_avarage_avarage': 4.719035}

Results with commented await asyncio.sleep(delay1)

{'name': 'only uvicorn    ', 'number_of_tests': 2000, 'total_time_taken_avarage': 151.301, 'times_avarage_avarage': 0.602495}
{'name': 'gunicorn+uvicorn', 'number_of_tests': 2000, 'total_time_taken_avarage': 144.4655, 'times_avarage_avarage': 0.59196}

I also made another version of above script which changes urls every 1 request (it gives slightly higher times):

import statistics
import requests
from time import sleep

number_of_tests=1000

sites_to_test=[
    {
        'name':'only uvicorn    ',
        'url':'http://127.0.0.1:8083/delay/0.0/0.0',
        'total_time_taken_list':[],
        'times_avarage_list':[]
    },
    {
        'name':'gunicorn+uvicorn',
        'url':'http://127.0.0.1:8084/delay/0.0/0.0',
        'total_time_taken_list':[],
        'times_avarage_list':[]
    }]


for test in sites_to_test:
    requests.get(test['url']) # first request may be slower, so better to not measure it

for a in range(number_of_tests):

    for test in sites_to_test:
        r = requests.get(test['url'])
        json= r.json()

        test['total_time_taken_list'].append(json['total_time_taken'])
        test['times_avarage_list'].append(json['times_avarage'])
        # sleep(1) # results are slightly different with sleep between requests


for test in sites_to_test:
    total_time_taken_avarage=statistics.mean(test['total_time_taken_list'])
    times_avarage_avarage=statistics.mean(test['times_avarage_list'])

    print({'name':test['name'], 'number_of_tests':number_of_tests, 'total_time_taken_avarage':total_time_taken_avarage, 'times_avarage_avarage':times_avarage_avarage})

Results:

{'name': 'only uvicorn    ', 'number_of_tests': 2000, 'total_time_taken_avarage': 589.4315, 'times_avarage_avarage': 4.789385}
{'name': 'gunicorn+uvicorn', 'number_of_tests': 2000, 'total_time_taken_avarage': 589.0915, 'times_avarage_avarage': 4.761095}

Results with commented await asyncio.sleep(delay1)

{'name': 'only uvicorn    ', 'number_of_tests': 2000, 'total_time_taken_avarage': 152.8365, 'times_avarage_avarage': 0.59173}
{'name': 'gunicorn+uvicorn', 'number_of_tests': 2000, 'total_time_taken_avarage': 154.4525, 'times_avarage_avarage': 0.59768}

This answer should help you debug your results better.

I think it may help to investigate your results if you share more details about your OS / machine.

Also please restart your computer/server, it may have impact.

Update 1:

I see that I used newer version of uvicorn 0.14.0 than stated in question 0.13.4. I also tested with older version 0.13.4 but results are similar, I still can't reproduce your results.

Update 2:

I run some more benchmarks and I noticed something interesting:

with uvloop in requirements.txt:

whole requirements.txt:

uvicorn==0.14.0
fastapi==0.65.1
gunicorn==20.1.0
uvloop==0.15.2

Results:

{'name': 'only uvicorn    ', 'number_of_tests': 500, 'total_time_taken_avarage': 362.038, 'times_avarage_avarage': 2.54142}
{'name': 'gunicorn+uvicorn', 'number_of_tests': 500, 'total_time_taken_avarage': 366.814, 'times_avarage_avarage': 2.56766}

without uvloop in requirements.txt:

whole requirements.txt:

uvicorn==0.14.0
fastapi==0.65.1
gunicorn==20.1.0

Results:

{'name': 'only uvicorn    ', 'number_of_tests': 500, 'total_time_taken_avarage': 595.578, 'times_avarage_avarage': 4.83828}
{'name': 'gunicorn+uvicorn', 'number_of_tests': 500, 'total_time_taken_avarage': 584.64, 'times_avarage_avarage': 4.7155}

Update 3:

I was using only Python 3.9.5 in this answer.

179

answered Oct 27 '22 09:10

Karol Zlot

The difference is due to the underlying web server you use.

An analogy can be: two cars, same brand, same options, just a different engine, what's the difference?

Web servers are not exactly like a car, but I guess you get the point I'm trying to make.

Basically, gunicorn is a synchronous web server, while uvicorn is an asynchronous web server. Since you're using fastapi and await keywords I guess that you already know what asyncio/asynchornous programming is.

I don't know the code differences, so take my answer with a grain of salt, but uvicorn is more performant because of the asynchronous part. My guess for the timing difference, is that if you use an async web server, it is already configured on startup for handling async functions, while if you use a sync web server, it isn't and there is some kind of overhead in order to abstract that part.

It's not a proper answer, but it gives you a hint on where the difference could lie.

answered Oct 27 '22 09:10

lsabi

Related questions
                            
                                Why does handling multiple exceptions require a tuple, but not a list?
                            
                                How to get comparable and reproducible results from LogisticRegressionCV and GridSearchCV
                            
                                compare list of datetime to dict of datetime
                            
                                Pagination on pandas dataframe.to_html()
                            
                                TypeError when combining ABCMeta with __init_subclass__ in Python 3.6
                            
                                Can I use the generic logging.Logger() in Celery tasks?
                            
                                ValueError: "needs to have a value for field "id" before this many-to-many relationship can be used"
                            
                                How to prepare data for LSTM when using multiple time series of different lengths and multiple features?
                            
                                Is there a function to extract image patches in PyTorch?
                            
                                Python - Log memory usage
                            
                                Django swagger- How to disable DjangoFilterBackend query filters from delete, put methods?
                            
                                Pyinstaller on a setuptools package
                            
                                python error Suppressing signal 18 to win32
                            
                                Python Unittest: No tests discovered in Visual Studio Code
                            
                                How do I make a python dataclass inherit __hash__?
                            
                                Can conda environment inherit base packages?
                            
                                Complex dataset split - StratifiedGroupShuffleSplit
                            
                                Perfrom cumulative sum over a column but reset to 0 if sum become negative in Pandas
                            
                                Django channels using secured WebSocket connection - WSS://
                            
                                How does Waitress handle concurrent tasks?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fastapi python code execution speed impacted by deployment with uvicorn vs gunicorn

Tags:

python

gunicorn

fastapi

uvicorn

M.D.

People also ask

2 Answers

I can't reproduce your results.

with uvloop in requirements.txt:

without uvloop in requirements.txt:

Karol Zlot

lsabi

Recent Activity

Donate For Us