I’ve been digging into FastAPI’s handling of synchronous and asynchronous endpoints, and I’ve come across a few things that I’m trying to understand more clearly, especially with regards to how blocking operations behave in Python.
From what I understand, when a synchronous route (defined with def) is called, FastAPI offloads it to a separate thread from the thread pool to avoid blocking the main event loop. This makes sense, as the thread can be blocked (e.g., time.sleep()), but the event loop itself doesn’t get blocked because it continues handling other requests.
But here’s my confusion: If the function is truly blocking (e.g., it’s waiting for something like time.sleep()), how is the event loop still able to execute other tasks concurrently? Isn’t the Python interpreter supposed to execute just one thread at a time?
Here an example:
from fastapi import APIRouter
import os
import threading
import asyncio
app = APIRouter()
@app.get('/sync')
def tarefa_sincrona():
print('Sync')
total = 0
for i in range(10223424*1043):
total += i
print('Sync task done')
@app.get('/async')
async def tarefa_sincrona():
print('Async task')
await asyncio.sleep(5)
print('Async task done')
If I make two requests — the first one to the sync endpoint and the second one to the async endpoint — almost at the same time, I expected the event loop to be blocked. However, in reality, what happens is that the two requests are executed "in parallel."
If the function is truly blocking (e.g., it’s waiting for something like time.sleep()), how is the event loop still able to execute other tasks concurrently? Isn’t the Python interpreter supposed to execute just one thread at a time?
Only one thread is indeed executed at a time. The flaw in the quoted question is to assume that time.sleep()
keeps the thread active - as another answerer has pointed out, it does not.
The TL;DR is that time.sleep()
does block the thread, but it contains a C
macro that periodically releases its lock on the global interpreter.
Concurrency in Python (with GIL)
CPython
will periodically release the running thread's GIL if there are other threads waiting for execution timeVoluntarily releasing locks is pretty common. In C
-extensions, it's practically mandatory:
Py_BEGIN_ALLOW_THREADS
is a macro for { PyThreadState *_save; _save = PyEval_SaveThread();
PyEval_SaveThread()
releases GIL.time.sleep()
voluntarily releases the lock on the global interpreter with the macro mentioned above.
Synchronous threading:
As mentioned earlier, Python will regularly try to release the GIL so that other threads can get a bit of execution time.
For threads with a varied workload, this is smart. If a thread is waiting for I/O but the code doesn't voluntarily release GIL, this method will still result in the GIL being swapped to a new thread.
For threads that are entirely or primarily CPU-bound, it works... but it doesn't speed up execution. I'll include code that proves this at the end of the post.
The reason it doesn't provide a speed-up in this case is that CPU-bound operations aren't waiting on anything, so sleeping func_1
to give execution time to func_2
just means that func_1
is idle for no reason - with the result that func_1
's potential completion time gets staggered by the amount of execution time is granted to func_2
.
Inside of an event loop:
asyncio
's event loop is single-threaded, which is to say that it doesn't spawn new threads. Each coroutine that runs, uses the main thread (the same thread the event loop lives in). The way this works is that the event loop and its coroutines work together to pass the GIL among themselves.
But why aren't coroutines offloaded to threads, so that CPython
can step in and release the GIL to to other threads?
Many reasons, but the easiest to grasp is maybe this: In practice that would have meant running the risk of significantly lagging the event loop. Because instead of immediately resuming its own tasks (which is to spawn a new coroutine) when the current coroutine finishes, it now possibly has to wait for execution time due to the GIL having been passed off elsewhere. Similarly, coroutines would take longer to finish due to constant context-switching.
Which is a long-winded way of saying that if time.sleep()
didn't release its lock, or if you were running a long CPU-bound thing, a single thread would indeed block the entire event loop (by hogging the GIL).
So what now?
Inside of GIL-bound Python, whether it's sync
or async
, the only way to execute CPU-binding code (that doesn't actively release its lock) with true concurrency is at the process-level, so either multiprocessing or concurrent.futures.ProcessPoolExecutor, as each process will have its own GIL.
So:
async
functions running CPU-bound code (with no voluntary yields) will run to completion before yielding GIL.
sync
functions in separate threads running CPU-bound code with no voluntary yields will get paused periodically, and the GIL gets passed off elsewhere.
(For clarity:) sync
functions in the same thread will have no concurrency whatsoever.
multiprocessing
docs also hint very clearly at the above descriptions:
The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.
As well as threading
docs:
threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously
Reading between the lines, this is much the same as saying that tasks bound by anything other than I/O won't achieve any noteworthy concurrency through threading.
Testing it yourself:
# main.py
from fastapi import FastAPI
import time
import os
import threading
app = FastAPI()
def bind_cpu(id: int):
thread_id = threading.get_ident()
print(f"{time.perf_counter():.4f}: BIND GIL for ID: {id}, internals: PID({os.getpid()}), thread({thread_id})")
start = time.perf_counter()
total = 0
for i in range(100_000_000):
total += i
end = time.perf_counter()
print(f"{time.perf_counter():.4f}: REL GIL for ID: {id}, internals: PID({os.getpid()}), thread({thread_id}). Duration: {end-start:.4f}s")
return total
def endpoint_handler(method: str, id: int):
print(f"{time.perf_counter():.4f}: Worker reads {method} endpoint with ID: {id} - internals: PID({os.getpid()}), thread({threading.get_ident()})")
result = bind_cpu(id)
print(f"{time.perf_counter():.4f}: Worker finished ID: {id} - internals: PID({os.getpid()}), thread({threading.get_ident()})")
return f"ID: {id}, {result}"
@app.get("/async/{id}")
async def async_endpoint_that_gets_blocked(id: int):
return endpoint_handler("async", id)
@app.get("/sync/{id}")
def sync_endpoint_that_gets_blocked(id: int):
return endpoint_handler("sync", id)
if __name__ == "__main__":
import uvicorn
uvicorn.run("main:app", host="0.0.0.0", port=8000, reload=True, workers=1)
# test.py
import asyncio
import httpx
import time
async def send_requests():
async with httpx.AsyncClient(timeout=httpx.Timeout(25.0)) as client:
tasks = []
for i in range(1, 5):
print(f"{time.perf_counter():.4f}: Sending HTTP request for id: {i}")
if i % 2 == 0:
tasks.append(client.get(f"http://localhost:8000/async/{i}"))
else:
tasks.append(client.get(f"http://localhost:8000/sync/{i}"))
responses = await asyncio.gather(*tasks)
for response in responses:
print(f"{time.perf_counter():.4f}: {response.text}")
asyncio.run(send_requests())
python main.py
)python test.py
)You will get results looking something like this:
[...]
INFO: Waiting for application startup.
INFO: Application startup complete.
10755.6897: Sending HTTP request for id: 1
10755.6900: Sending HTTP request for id: 2
10755.6902: Sending HTTP request for id: 3
10755.6904: Sending HTTP request for id: 4
10755.9722: Worker reads async endpoint with ID: 4 - internals: PID(24492), thread(8972)
10755.9725: BIND GIL for ID: 4, internals: PID(24492), thread(8972)
10759.4551: REL GIL for ID: 4, internals: PID(24492), thread(8972). Duration: 3.4823s
10759.4554: Worker finished ID: 4 - internals: PID(24492), thread(8972)
INFO: 127.0.0.1:56883 - "GET /async/4 HTTP/1.1" 200 OK
10759.4566: Worker reads async endpoint with ID: 2 - internals: PID(24492), thread(8972)
10759.4568: BIND GIL for ID: 2, internals: PID(24492), thread(8972)
10762.6428: REL GIL for ID: 2, internals: PID(24492), thread(8972). Duration: 3.1857s
10762.6431: Worker finished ID: 2 - internals: PID(24492), thread(8972)
INFO: 127.0.0.1:56884 - "GET /async/2 HTTP/1.1" 200 OK
10762.6446: Worker reads sync endpoint with ID: 3 - internals: PID(24492), thread(22648)
10762.6448: BIND GIL for ID: 3, internals: PID(24492), thread(22648)
10762.6968: Worker reads sync endpoint with ID: 1 - internals: PID(24492), thread(9144)
10762.7127: BIND GIL for ID: 1, internals: PID(24492), thread(9144)
10768.9234: REL GIL for ID: 3, internals: PID(24492), thread(22648). Duration: 6.2784s
10768.9338: Worker finished ID: 3 - internals: PID(24492), thread(22648)
INFO: 127.0.0.1:56882 - "GET /sync/3 HTTP/1.1" 200 OK
10769.2121: REL GIL for ID: 1, internals: PID(24492), thread(9144). Duration: 6.4835s
10769.2124: Worker finished ID: 1 - internals: PID(24492), thread(9144)
INFO: 127.0.0.1:56885 - "GET /sync/1 HTTP/1.1" 200 OK
10769.2138: "ID: 1, 4999999950000000"
10769.2141: "ID: 2, 4999999950000000"
10769.2143: "ID: 3, 4999999950000000"
10769.2145: "ID: 4, 4999999950000000"
Interpretation
Going over the timestamps and the durations, two things are immediately clear:
async
endpoints are executing de-facto synchronouslysync
endpoints are executing concurrently and finish nearly at the same time BUT each request takes twice as long to complete compared to the async
onesBoth of these results are expected, re: the explanations earlier.
The async
endpoints become de-facto synchronous because the function we built hoards the GIL, and so the event loop gets no execution time until the coroutine returns.
The sync
endpoints become faux-asynchronous because Python's context manager is swapping between them every ~5ms, which means that the first request increments by x%, then the second request increments by x% - repeat until both finish ~ish at the same time.
time.sleep() block the current process but it doesnt completly render the interpreter useless since it need to measure the time. So it keeps working.
Think it like a person looking his clock and waiting. The person is capable to do other things and keeps breathing for example but their main foucs it to wait for sometime. Maybe waiting for their meal to cook.
In your scenerio where you use asynchronous, python interpreter just pauses one task and looks at other. So it is not completly usesless. Think it like a round-robin. Works for one process for limited cpu clock time (waiting for the time sleep in this example) then pauses it and looks at other process. "the function is truly blocking" doesnt mean it renders interpreter to unable to do anything other but it just tells it to wait for something.
So our person in example does some other task like loading the dishes in dishwasher and for every 4 dish placed they check their clock to see if their meal is ready. So cooking the meal is a blocking process for preapering dinner since you need to wait for it to be cooked. But you can asyncly load the dishes and check for the time to see if meal is ready.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With