Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do we need `async for` and `async with`?

What's the point of introducing async for and async with? I know there are PEPs for these statements, but they are clearly intended for language designers, not average users like me. A high-level rationale supplemented with examples would be greatly appreciated.

I did some research myself and found this answer:

The async for and async with statements are needed because you would break the yield from/await chain with the bare for and with statements.

The author didn't give an example of how the chain might be broken though, so I'm still confused. Furthermore, I notice that Python has async for and async with, but not async while and async try ... except. This sounds strange because for and with just syntax sugars for while and try ... except respectively. I mean, wouldn't async versions of the latter statements allow more flexibility, given that they are the building blocks of the former?

There is another answer discussing async for, but it only covers what it is not for, and didn't say much about what it is for.

As a bonus, are async for and async with syntax sugars? If they are, what are their verbose equivalent forms?

like image 785
nalzok Avatar asked Apr 14 '21 12:04

nalzok


People also ask

Why do we need async and await?

Async/Await makes it easier to write promises. The keyword 'async' before a function makes the function return a promise, always. And the keyword await is used inside async functions, which makes the program wait until the Promise resolves.

Why is async needed?

Note: The purpose of async / await is to simplify the syntax necessary to consume promise-based APIs. The behavior of async / await is similar to combining generators and promises. Async functions always return a promise.

Why do we use async await over promises?

Async/Await is used to work with promises in asynchronous functions. It is basically syntactic sugar for promises. It is just a wrapper to restyle code and make promises easier to read and use. It makes asynchronous code look more like synchronous/procedural code, which is easier to understand.

Why we need async and await in C#?

The async keyword turns a method into an async method, which allows you to use the await keyword in its body. When the await keyword is applied, it suspends the calling method and yields control back to its caller until the awaited task is complete. await can only be used inside an async method.


3 Answers

TLDR: for and with are non-trivial syntactic sugar that encapsulate several steps of calling related methods. This makes it impossible to manually add awaits between these steps – but properly usable async for/with need that. At the same time, this means it is vital to have async support for them.


Why we can't await nice things

Python's statements and expressions are backed by so-called protocols: When an object is used in some specific statement/expression, Python calls corresponding "special methods" on the object to allow customization. For example, x in [1, 2, 3] delegates to list.__contains__ to define what in actually means.
Most protocols are straightforward: There is one special method called for each statement/expression. If the only async feature we have is the primitive await, then we can still make all these "one special method" statements/expression "async" by sprinkling await at the right place.

In contrast, the for and with statements both correspond to multiple steps: for uses the iterator protocol to repeatedly fetch the __next__ item of an iterator, and with uses the context manager protocol to both enter and exit a context.
The important part is that both have more than one step that might need to be asynchronous. While we could manually sprinkle an await at one of these steps, we cannot hit all of them.

  • The easier case to look at is with: we can address at the __enter__ and __exit__ method separately.

    We could naively define a syncronous context manager with asynchronous special methods. For entering this actually works by adding an await strategically:

    with AsyncEnterContext() as acm:
        context = await acm
        print("I entered an async context and all I got was this lousy", context)
    

    However, it already breaks down if we use a single with statement for multiple contexts: We would first enter all contexts at once, then await all of them at once.

    with AsyncEnterContext() as acm1, AsyncEnterContext() as acm2:
        context1, context2 = await acm1, await acm2  # wrong! acm1 must be entered completely before loading acm2
        print("I entered many async contexts and all I got was a rules lawyer telling me I did it wrong!")
    

    Worse, there is just no single point where we could await exiting properly.

While it's true that for and with are syntactic sugar, they are non-trivial syntactic sugar: They make multiple actions nicer. As a result, one cannot naively await individual actions of them. Only a blanket async with and async for can cover every step.

Why we want to async nice things

Both for and with are abstractions: They fully encapsulate the idea of iteration/contextualisation.

Picking one of the two again, Python's for is the abstraction of internal iteration – for contrast, a while is the abstraction of external iteration. In short, that means the entire point of for is that the programmer does not have to know how iteration actually works.

  • Compare how one would iterate a list using for or while:
    some_list = list(range(20))
    index = 0                      # lists are indexed from 0
    while index < len(some_list):  # lists are indexed up to len-1
        print(some_list[index])    # lists are directly index'able
        index += 1                 # lists are evenly spaced
    
    for item in some_list:         # lists are iterable
        print(item)
    
    The external while iteration relies on knowledge about how lists work concretely: It pulls implementation details out of the iterable and puts them into the loop. In contrast, internal for iteration only relies on knowing that lists are iterable. It would work with any implementation of lists, and in fact any implementation of iterables.

Bottom line is the entire point of for – and with – is not to bother with implementation details. That includes having to know which steps we need to sprinkle with async. Only a blanket async with and async for can cover every step without us knowing which.

Why we need to async nice things

A valid question is why for and with get async variants, but others do not. There is a subtle point about for and with that is not obvious in daily usage: both represent concurrency – and concurrency is the domain of async.

Without going too much into detail, a handwavy explanation is the equivalence of handling routines (()), iterables (for) and context managers (with). As has been established in the answer cited in the question, coroutines are actually a kind of generators. Obviously, generators are also iterables and in fact we can express any iterable via a generator. The less obvious piece is that context managers are also equivalent to generators – most importantly, contextlib.contextmanager can translate generators to context managers.

To consistently handle all kinds of concurrency, we need async variants for routines (await), iterables (async for) and context managers (async with). Only a blanket async with and async for can cover every step consistently.

like image 184
MisterMiyagi Avatar answered Oct 07 '22 11:10

MisterMiyagi


async for and async with are logical continuation of the development from lower to higher levels.

In the past, the for loop in a programming language used to be capable only of simple iterating over an array of values linearly indexed 0, 1, 2 ... max.

Python's for loop is a higher-level construct. It can iterate over anything supporting the iteration protocol, e.g. set elements or nodes in a tree - none of them has items numbered 0, 1, 2, ... etc.

The core of the iteration protocol is the __next__ special method. Each successive call returns the next item (which may be a computed value or retrieved data) or signals the end of iteration.

The async for is the asynchronous counterpart, instead of calling the regular __next__ it awaits the asynchronous __anext__ and everything else remains the same. That allows to use common idioms in async programs:

# 1. print lines of text stored in a file
for line in regular_file:
    print(line)

# 2A. print lines of text as they arrive over the network,
#
# The same idiom as above, but the asynchronous character makes
# it possible to execute other tasks while waiting for new data
async for line in tcp_stream:
    print(line)

# 2B: the same with a spawned command
async for line in running_subprocess.stdout:
    print(line)

The situation with async with is similar. To summarize: the try .. finally construct was replaced by more convenient with block - now considered idiomatic - that can communicate with anything supporting the context manager protocol with its __enter__ and __exit__ methods for entering and exiting the block. Naturally, everything formerly used in a try .. finally was rewritten to become a context manager (locks, pairs of open-close calls, etc)

async with is again a counterpart with asynchronous __aenter__ and __aexit__ special methods. Other tasks may run while the asynchronous code for entering or exiting a with block waits for new data or a lock or some other condition to become fulfilled.

Note: unlike for, it was possible to use asynchronous objects with the plain (not async) with statement: with await lock:, it is deprecated or unsupported now (note that it was not an exact equivalent of async with).

like image 24
VPfB Avatar answered Oct 07 '22 13:10

VPfB


My understanding of async with is that it allows python to call the await keyword inside the context manager without python freaking out. Removing the async from the with results in errors. This is useful because the object created is most likely going to do expensive io operations we have to wait for - so we will likely await methods from the object created from this special asynced context manager. Without this closing and opening the context manager correctly likely creates issues within python (otherwise why bother users of python with even more nuanced syntax and semantics to learn?).

I have not fully tested what async for does or the intricacies of it but would love to see an example and might later test it once I need it and update this answer. I will put the example here once I get to it: https://github.com/brando90/ultimate-utils/blob/master/tutorials_for_myself/concurrency/asyncio_for.py

For now see my annotated example with async with (script lives https://github.com/brando90/ultimate-utils/blob/master/tutorials_for_myself/concurrency/asyncio_my_example.py):

"""
1. https://realpython.com/async-io-python/#the-asyncawait-syntax-and-native-coroutines
2. https://realpython.com/python-concurrency/
3. https://stackoverflow.com/questions/67092070/why-do-we-need-async-for-and-async-with

todo - async with, async for.

todo: meaning of:
    - The async for and async with statements are only needed to the extent that using plain for or with would “break”
        the nature of await in the coroutine. This distinction between asynchronicity and concurrency is a key one to grasp
    - One exception to this that you’ll see in the next code is the async with statement, which creates a context
        manager from an object you would normally await. While the semantics are a little different, the idea is the
        same: to flag this context manager as something that can get swapped out.
    - download_site() at the top is almost identical to the threading version with the exception of the async keyword on
        the function definition line and the async with keywords when you actually call session.get().
        You’ll see later why Session can be passed in here rather than using thread-local storage.
    - An asynchronous context manager is a context manager that is able to suspend execution in its enter and exit
        methods.
"""

import asyncio
from asyncio import Task

import time

import aiohttp
from aiohttp.client_reqrep import ClientResponse

from typing import Coroutine


async def download_site(coroutine_name: str, session: aiohttp.ClientSession, url: str) -> ClientResponse:
    """
    Calls an expensive io (get data from a url) using the special session (awaitable) object. Note that not all objects
    are awaitable.
    """
    # - the with statement is bad here in my opion since async with is already mysterious and it's being used twice
    # async with session.get(url) as response:
    #     print("Read {0} from {1}".format(response.content_length, url))
    # - this won't work since it only creates the coroutine. It **has** to be awaited. The trick to have it be (buggy)
    # synchronous is to have the main coroutine call each task we want in order instead of giving all the tasks we want
    # at once to the vent loop e.g. with the asyncio.gather which gives all coroutines, gets the result in a list and
    # thus doesn't block!
    # response = session.get(url)
    # - right way to do async code is to have this await so someone else can run. Note, if the download_site/ parent
    # program is awaited in a for loop this won't work regardless.
    response = await session.get(url)
    print(f"Read {response.content_length} from {url} using {coroutine_name=}")
    return response

async def download_all_sites_not_actually_async_buggy(sites: list[str]) -> list[ClientResponse]:
    """
    Code to demo the none async code. The code isn't truly asynchronous/concurrent because we are awaiting all the io
    calls (to the network) in the for loop. To avoid this issue, give the list of coroutines to a function that actually
    dispatches the io like asyncio.gather.

    My understanding is that async with allows the object given to be a awaitable object. This means that the object
    created is an object that does io calls so it might block so it's often the case we await it. Recall that when we
    run await f() f is either 1) coroutine that gains control (but might block code!) or 2) io call that takes a long
    time. But because of how python works after the await finishes the program expects the response to "actually be
    there". Thus, doing await blindly doesn't speed up the code. Do awaits on real io calls and call them with things
    that give it to the event loop (e.g. asyncio.gather).

    """
    # - create a awaitable object without having the context manager explode if it gives up execution.
    # - crucially, the session is an aiosession - so it is actually awaitable so we can actually give it to
    # - asyncio.gather and thus in the async code we truly take advantage of the concurrency of asynchronous programming
    async with aiohttp.ClientSession() as session:
    # with aiohttp.ClientSession() as session:  # won't work because there is an await inside this with
        tasks: list[Task] = []
        responses: list[ClientResponse] = []
        for i, url in enumerate(sites):
            task: Task = asyncio.ensure_future(download_site(f'coroutine{i}', session, url))
            tasks.append(task)
            response: ClientResponse = await session.get(url)
            responses.append(response)
        return responses


async def download_all_sites_truly_async(sites: list[str]) -> list[ClientResponse]:
    """
    Truly async program that calls creates a bunch of coroutines that download data from urls and the uses gather to
    have the event loop run it asynchronously (and thus efficiently). Note there is only one process though.
    """
    # - indicates that session is an async obj that will likely be awaited since it likely does an expensive io that
    # - waits so it wants to give control back to the event loop or other coroutines so they can do stuff while the
    # - io happens
    async with aiohttp.ClientSession() as session:
        tasks: list[Task] = []
        for i, url in enumerate(sites):
            task: Task = asyncio.ensure_future(download_site(f'coroutine{i}', session, url))
            tasks.append(task)
        responses: list[ClientResponse] = await asyncio.gather(*tasks, return_exceptions=True)
        return responses


if __name__ == "__main__":
    # - args
    sites = ["https://www.jython.org", "http://olympus.realpython.org/dice"] * 80
    start_time = time.time()

    # - run main async code
    # main_coroutine: Coroutine = download_all_sites_truly_async(sites)
    main_coroutine: Coroutine = download_all_sites_not_actually_async_buggy(sites)
    responses: list[ClientResponse] = asyncio.run(main_coroutine)

    # - print stats
    duration = time.time() - start_time
    print(f"Downloaded {len(sites)} sites in {duration} seconds")
    print('Success, done!\a')
like image 40
Charlie Parker Avatar answered Oct 07 '22 13:10

Charlie Parker