I am writing a tool in Python 3.6 that sends requests to several APIs (with various endpoints) and collects their responses to parse and save them in a database.
The API clients that I use have a synchronous version of requesting a URL, for instance they use
urllib.request.Request('...
Or they use Kenneth Reitz' Requests
library.
Since my API calls rely on synchronous versions of requesting a URL, the whole process takes several minutes to complete.
Now I'd like to wrap my API calls in async/await (asyncio). I'm using python 3.6.
All the examples / tutorials that I found want me to change the synchronous URL calls / requests
to an async version of it (for instance aiohttp
). Since my code relies on API clients that I haven't written (and I can't change) I need to leave that code untouched.
So is there a way to wrap my synchronous requests (blocking code) in async/await to make them run in an event loop?
I'm new to asyncio in Python. This would be a no-brainer in NodeJS. But I can't wrap my head around this in Python.
asyncio enables to actually handle many concurrent (not parallel!) requests with no threads at all (well, just one). However, requests does not support asyncio so you need to create threads to get concurrency.
The keyword await passes function control back to the event loop. (It suspends the execution of the surrounding coroutine.) If Python encounters an await f() expression in the scope of g() , this is how await tells the event loop, “Suspend execution of g() until whatever I'm waiting on—the result of f() —is returned.
The await keyword works only in async functions. As the following two examples show, the async and await keywords result in asynchronous code that looks a lot like synchronous code.
There are two basic types of methods in the Parallels Python API: synchronous and asynchronous. When a synchronous method is invoked, it completes executing before returning to the caller. An asynchronous method starts a job in the background and returns to the caller immediately.
The solution is to wrap your synchronous code in the thread and run it that way. I used that exact system to make my asyncio
code run boto3
(note: remove inline type-hints if running < python3.6):
async def get(self, key: str) -> bytes:
s3 = boto3.client("s3")
loop = asyncio.get_event_loop()
try:
response: typing.Mapping = \
await loop.run_in_executor( # type: ignore
None, functools.partial(
s3.get_object,
Bucket=self.bucket_name,
Key=key))
except botocore.exceptions.ClientError as e:
if e.response["Error"]["Code"] == "NoSuchKey":
raise base.KeyNotFoundException(self, key) from e
elif e.response["Error"]["Code"] == "AccessDenied":
raise base.AccessDeniedException(self, key) from e
else:
raise
return response["Body"].read()
Note that this will work because the vast amount of time in the s3.get_object()
code is spent in waiting for I/O, and (generally) while waiting for I/O python releases the GIL (the GIL is the reason that generally threads in python is not a good idea).
The first argument None
in run_in_executor
means that we run in the default executor. This is a threadpool executor, but it may make things more explicit to explicitly assign a threadpool executor there.
Note that, where using pure async I/O you could easily have thousands of connections open concurrently, using a threadpool executor means that each concurrent call to the API needs a separate thread. Once you run out of threads in your pool, the threadpool will not schedule your new call until a thread becomes available. You can obviously raise the number of threads, but this will eat up memory; don't expect to be able to go over a couple of thousand.
Also see the python ThreadPoolExecutor docs for an explanation and some slightly different code on how to wrap your sync call in async code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With