I have to send a lot of HTTP requests, once all of them have returned, the program can continue. Sounds like a perfect match for asyncio
. A bit naively, I wrapped my calls to requests
in an async
function and gave them to asyncio
. This doesn't work.
After searching online, I found two solutions:
asyncio
run_in_executor
To understand this better, I wrote a small benchmark. The server-side is a flask program that waits 0.1 seconds before answering a request.
from flask import Flask
import time
app = Flask(__name__)
@app.route('/')
def hello_world():
time.sleep(0.1) // heavy calculations here :)
return 'Hello World!'
if __name__ == '__main__':
app.run()
The client is my benchmark
import requests
from time import perf_counter, sleep
# this is the baseline, sequential calls to requests.get
start = perf_counter()
for i in range(10):
r = requests.get("http://127.0.0.1:5000/")
stop = perf_counter()
print(f"synchronous took {stop-start} seconds") # 1.062 secs
# now the naive asyncio version
import asyncio
loop = asyncio.get_event_loop()
async def get_response():
r = requests.get("http://127.0.0.1:5000/")
start = perf_counter()
loop.run_until_complete(asyncio.gather(*[get_response() for i in range(10)]))
stop = perf_counter()
print(f"asynchronous took {stop-start} seconds") # 1.049 secs
# the fast asyncio version
start = perf_counter()
loop.run_until_complete(asyncio.gather(
*[loop.run_in_executor(None, requests.get, 'http://127.0.0.1:5000/') for i in range(10)]))
stop = perf_counter()
print(f"asynchronous (executor) took {stop-start} seconds") # 0.122 secs
#finally, aiohttp
import aiohttp
async def get_response(session):
async with session.get("http://127.0.0.1:5000/") as response:
return await response.text()
async def main():
async with aiohttp.ClientSession() as session:
await get_response(session)
start = perf_counter()
loop.run_until_complete(asyncio.gather(*[main() for i in range(10)]))
stop = perf_counter()
print(f"aiohttp took {stop-start} seconds") # 0.121 secs
So, an intuitive implementation with asyncio
doesn't deal with blocking io code. But if you use asyncio
correctly, it is just as fast as the special aiohttp
framework. The docs for coroutines and tasks don't really mention this. Only if you read up on the loop.run_in_executor(), it says:
# File operations (such as logging) can block the # event loop: run them in a thread pool.
I was surprised by this behaviour. The purpose of asyncio is to speed up blocking io calls. Why is an additional wrapper, run_in_executor
, necessary to do this?
The whole selling point of aiohttp
seems to be support for asyncio
. But as far as I can see, the requests
module works perfectly - as long as you wrap it in an executor. Is there a reason to avoid wrapping something in an executor ?
It is important to note that asyncio does not circumvent the GIL, we are still subject to it. If we have a CPU bound task, we still need to use multiple processes to execute it concurrently (which can be done with asyncio itself), otherwise we will cause performance issues in our application.
One of the cool advantages of asyncio is that it scales far better than threading . Each task takes far fewer resources and less time to create than a thread, so creating and running more of them works well. This example just creates a separate task for each site to download, which works out quite well.
asyncio is a library to write concurrent code using the async/await syntax. asyncio is used as a foundation for multiple Python asynchronous frameworks that provide high-performance network and web-servers, database connection libraries, distributed task queues, etc.
asyncio has an API for interoperating with Python's multiprocessing library. This lets us use async await syntax as well as asyncio APIs with multiple processes.
But as far as I can see, the requests module works perfectly - as long as you wrap it in an executor. Is there a reason to avoid wrapping something in an executor ?
Running code in executor means to run it in OS threads.
aiohttp
and similar libraries allow to run non-blocking code without OS threads, using coroutines only.
If you don't have much work, difference between OS threads and coroutines is not significant especially comparing to bottleneck - I/O operations. But once you have much work you can notice that OS threads perform relatively worse due to expensively context switching.
For example, when I change your code to time.sleep(0.001)
and range(100)
, my machine shows:
asynchronous (executor) took 0.21461606299999997 seconds
aiohttp took 0.12484742700000007 seconds
And this difference will only increase according to number of requests.
The purpose of asyncio is to speed up blocking io calls.
Nope, purpose of asyncio
is to provide convenient way to control execution flow. asyncio
allows you to choose how flow works - based on coroutines and OS threads (when you use executor) or on pure coroutines (like aiohttp
does).
It's aiohttp
's purpose to speed up things and it copes with the task as shown above :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With