Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't asyncio always use executors?

I have to send a lot of HTTP requests, once all of them have returned, the program can continue. Sounds like a perfect match for asyncio. A bit naively, I wrapped my calls to requests in an async function and gave them to asyncio. This doesn't work.

After searching online, I found two solutions:

  • use a library like aiohttp, which is made to work with asyncio
  • wrap the blocking code in a call to run_in_executor

To understand this better, I wrote a small benchmark. The server-side is a flask program that waits 0.1 seconds before answering a request.

from flask import Flask
import time

app = Flask(__name__)


@app.route('/')
def hello_world():
    time.sleep(0.1) // heavy calculations here :)
    return 'Hello World!'


if __name__ == '__main__':
    app.run()

The client is my benchmark

import requests
from time import perf_counter, sleep

# this is the baseline, sequential calls to requests.get
start = perf_counter()
for i in range(10):
    r = requests.get("http://127.0.0.1:5000/")
stop = perf_counter()
print(f"synchronous took {stop-start} seconds") # 1.062 secs

# now the naive asyncio version
import asyncio
loop = asyncio.get_event_loop()

async def get_response():
    r = requests.get("http://127.0.0.1:5000/")

start = perf_counter()
loop.run_until_complete(asyncio.gather(*[get_response() for i in range(10)]))
stop = perf_counter()
print(f"asynchronous took {stop-start} seconds") # 1.049 secs

# the fast asyncio version
start = perf_counter()
loop.run_until_complete(asyncio.gather(
    *[loop.run_in_executor(None, requests.get, 'http://127.0.0.1:5000/') for i in range(10)]))
stop = perf_counter()
print(f"asynchronous (executor) took {stop-start} seconds") # 0.122 secs

#finally, aiohttp
import aiohttp

async def get_response(session):
    async with session.get("http://127.0.0.1:5000/") as response:
        return await response.text()

async def main():
    async with aiohttp.ClientSession() as session:
        await get_response(session)

start = perf_counter()
loop.run_until_complete(asyncio.gather(*[main() for i in range(10)]))
stop = perf_counter()
print(f"aiohttp took {stop-start} seconds") # 0.121 secs

So, an intuitive implementation with asyncio doesn't deal with blocking io code. But if you use asyncio correctly, it is just as fast as the special aiohttp framework. The docs for coroutines and tasks don't really mention this. Only if you read up on the loop.run_in_executor(), it says:

# File operations (such as logging) can block the
# event loop: run them in a thread pool.

I was surprised by this behaviour. The purpose of asyncio is to speed up blocking io calls. Why is an additional wrapper, run_in_executor, necessary to do this?

The whole selling point of aiohttp seems to be support for asyncio. But as far as I can see, the requests module works perfectly - as long as you wrap it in an executor. Is there a reason to avoid wrapping something in an executor ?

like image 735
lhk Avatar asked Nov 12 '18 10:11

lhk


People also ask

Does Asyncio use Gil?

It is important to note that asyncio does not circumvent the GIL, we are still subject to it. If we have a CPU bound task, we still need to use multiple processes to execute it concurrently (which can be done with asyncio itself), otherwise we will cause performance issues in our application.

Is Asyncio faster than threads?

One of the cool advantages of asyncio is that it scales far better than threading . Each task takes far fewer resources and less time to create than a thread, so creating and running more of them works well. This example just creates a separate task for each site to download, which works out quite well.

What is the point of Asyncio?

asyncio is a library to write concurrent code using the async/await syntax. asyncio is used as a foundation for multiple Python asynchronous frameworks that provide high-performance network and web-servers, database connection libraries, distributed task queues, etc.

Does Asyncio use multiprocessing?

asyncio has an API for interoperating with Python's multiprocessing library. This lets us use async await syntax as well as asyncio APIs with multiple processes.


1 Answers

But as far as I can see, the requests module works perfectly - as long as you wrap it in an executor. Is there a reason to avoid wrapping something in an executor ?

Running code in executor means to run it in OS threads.

aiohttp and similar libraries allow to run non-blocking code without OS threads, using coroutines only.

If you don't have much work, difference between OS threads and coroutines is not significant especially comparing to bottleneck - I/O operations. But once you have much work you can notice that OS threads perform relatively worse due to expensively context switching.

For example, when I change your code to time.sleep(0.001) and range(100), my machine shows:

asynchronous (executor) took 0.21461606299999997 seconds
aiohttp took 0.12484742700000007 seconds

And this difference will only increase according to number of requests.

The purpose of asyncio is to speed up blocking io calls.

Nope, purpose of asyncio is to provide convenient way to control execution flow. asyncio allows you to choose how flow works - based on coroutines and OS threads (when you use executor) or on pure coroutines (like aiohttp does).

It's aiohttp's purpose to speed up things and it copes with the task as shown above :)

like image 110
Mikhail Gerasimov Avatar answered Sep 30 '22 18:09

Mikhail Gerasimov