Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I await an enumerable I create with a generator?

Let's say I have a sequence of integers I obtain asynchronously.

async Task<int> GetI(int i){
    return await Task.Delay(1000).ContinueWith(x => i);
}

I want to create a generator over that sequence, if the sequence was synchronous I'd do:

IEnumerable<int> Method()
{
    for (var i = 0; i < 100; i++)
    {
        yield return GetI(i); // won't work, since getI returns a task
    }
}

So, I figured the analogy is making the generator async and yielding from it:

async Task<IEnumerable<int>> Method()    
{
    for (var i = 0; i < 100; i++)
    {
        yield return await Task.Delay(1000).ContinueWith(x => i);
    }
}

This won't work, since a method with yield must return an IEnumerable of something, the alternative, which makes more sense is IEnumerable<Task<int>> but that won't compile since async methods must return Tasks or void.

Now, I realize I can simply remove the await and return an IEnumerable<Task<int>> but that won't help me since the iteration will keep asking for data before any of it is ready, so it doesn't solve my issue.

  • Is there any way to nicely mix enumerables and tasks with the nice sugar the language gives me with await and yield?
  • Is there any way to nicely consume it?

(From searching online, I suspect the answer to the first question is false and the second one is an observer/observable, but I couldn't find any canonical reference and I'm interested in the best way to implement this pattern in C#)

like image 328
Benjamin Gruenbaum Avatar asked Jun 15 '14 06:06

Benjamin Gruenbaum


1 Answers

Asynchronous sequences are interesting. There's a number of different approaches, depending on exactly what you want to do. I'm not entirely clear on your desired semantics, so these are some of the options.

Task<IEnumerable<T>> is an asynchronously-retrieved collection. There is only one task - one asynchronous operation - that retrieves the entire collection. This does not sound like it's what you want.

IEnumerable<Task<T>> is a (synchronous) sequence of (asynchronous) data. There are multiple tasks, which may or may not all be processing simultaneously. There are a couple of options for implementing this. One is using an enumerator block and yielding tasks; this approach will start a new asynchronous operation each time the next item is retrieved from the enumerable. Alternatively, you can create and return a collection of tasks with all tasks running concurrently (this can be done elegantly over a source sequence via LINQ's Select followed by ToList/ToArray). However, this has a couple of drawbacks: there is no way to asynchronously determine if the sequence is already ended, and it's not easy to immediately start the next item processing after returning the current item (which is commonly desired behavior).

The core problem is that IEnumerable<T> is inherently synchronous. There are a couple of workarounds. One is IAsyncEnumerable<T>, which is an asynchronous equivalent of IEnumerable<T> and available in the Ix-Async NuGet package. This approach has its own drawbacks, though. Of course, you lose the nice language support for IEnumerable<T> (namely, enumerator blocks and foreach). Also, the very notion of an "asynchronous enumerable" is not exactly performant; ideally, asynchronous APIs should be chunky rather than chatty, and enumerables are very chatty. More discussion on the original design here, and on the chunky/chatty considerations here.

So, these days a much more common solution is to use observables or dataflows (both also available via NuGet). In these cases, you have to think of the "sequence" as something with a life of its own. Observables are push-based, so the consuming code is (ideally) reactive. Dataflows have an actor feel, so they act more independent, again pushing results to the consuming code.

like image 161
Stephen Cleary Avatar answered Sep 19 '22 21:09

Stephen Cleary