Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do yield and await implement flow of control in .NET?

As I understand the yield keyword, if used from inside an iterator block, it returns flow of control to the calling code, and when the iterator is called again, it picks up where it left off.

Also, await not only waits for the callee, but it returns control to the caller, only to pick up where it left off when the caller awaits the method.

In other words-- there is no thread, and the "concurrency" of async and await is an illusion caused by clever flow of control, the details of which are concealed by the syntax.

Now, I'm a former assembly programmer and I'm very familiar with instruction pointers, stacks, etc. and I get how normal flows of control (subroutine, recursion, loops, branches) work. But these new constructs-- I don't get them.

When an await is reached, how does the runtime know what piece of code should execute next? How does it know when it can resume where it left off, and how does it remember where? What happens to the current call stack, does it get saved somehow? What if the calling method makes other method calls before it awaits-- why doesn't the stack get overwritten? And how on earth would the runtime work its way through all this in the case of an exception and a stack unwind?

When yield is reached, how does the runtime keep track of the point where things should be picked up? How is iterator state preserved?

like image 563
John Wu Avatar asked Feb 17 '17 01:02

John Wu


People also ask

What is the use of await in C#?

The await operator suspends evaluation of the enclosing async method until the asynchronous operation represented by its operand completes. When the asynchronous operation completes, the await operator returns the result of the operation, if any.

How is async await implemented?

Async and await are built on promises. The keyword “async” accompanies the function, indicating that it returns a promise. Within this function, the await keyword is applied to the promise being returned. The await keyword ensures that the function waits for the promise to resolve.

Does await Block C#?

The await keyword, by contrast, is non-blocking, which means the current thread is free to do other things during the wait.

Does async await use threads in C#?

The async and await keywords don't cause additional threads to be created. Async methods don't require multithreading because an async method doesn't run on its own thread. The method runs on the current synchronization context and uses time on the thread only when the method is active.


2 Answers

I'll answer your specific questions below, but you would likely do well to simply read my extensive articles on how we designed yield and await.

https://blogs.msdn.microsoft.com/ericlippert/tag/continuation-passing-style/

https://blogs.msdn.microsoft.com/ericlippert/tag/iterators/

https://blogs.msdn.microsoft.com/ericlippert/tag/async/

Some of these articles are out of date now; the code generated is different in a lot of ways. But these will certainly give you the idea of how it works.

Also, if you do not understand how lambdas are generated as closure classes, understand that first. You won't make heads or tails of async if you don't have lambdas down.

When an await is reached, how does the runtime know what piece of code should execute next?

await is generated as:

if (the task is not completed)   assign a delegate which executes the remainder of the method as the continuation of the task   return to the caller else   execute the remainder of the method now 

That's basically it. Await is just a fancy return.

How does it know when it can resume where it left off, and how does it remember where?

Well, how do you do that without await? When method foo calls method bar, somehow we remember how to get back to the middle of foo, with all the locals of the activation of foo intact, no matter what bar does.

You know how that's done in assembler. An activation record for foo is pushed onto the stack; it contains the values of the locals. At the point of the call the return address in foo is pushed onto the stack. When bar is done, the stack pointer and instruction pointer are reset to where they need to be and foo keeps going from where it left off.

The continuation of an await is exactly the same, except that the record is put onto the heap for the obvious reason that the sequence of activations does not form a stack.

The delegate which await gives as the continuation to the task contains (1) a number which is the input to a lookup table that gives the instruction pointer that you need to execute next, and (2) all the values of locals and temporaries.

There is some additional gear in there; for instance, in .NET it is illegal to branch into the middle of a try block, so you can't simply stick the address of code inside a try block into the table. But these are bookkeeping details. Conceptually, the activation record is simply moved onto the heap.

What happens to the current call stack, does it get saved somehow?

The relevant information in the current activation record is never put on the stack in the first place; it is allocated off the heap from the get-go. (Well, formal parameters are passed on the stack or in registers normally and then copied into a heap location when the method begins.)

The activation records of the callers are not stored; the await is probably going to return to them, remember, so they'll be dealt with normally.

Note that this is a germane difference between the simplified continuation passing style of await, and true call-with-current-continuation structures that you see in languages like Scheme. In those languages the entire continuation including the continuation back into the callers is captured by call-cc.

What if the calling method makes other method calls before it awaits-- why doesn't the stack get overwritten?

Those method calls return, and so their activation records are no longer on the stack at the point of the await.

And how on earth would the runtime work its way through all this in the case of an exception and a stack unwind?

In the event of an uncaught exception, the exception is caught, stored inside the task, and re-thrown when the task's result is fetched.

Remember all that bookkeeping I mentioned before? Getting exception semantics right was a huge pain, let me tell you.

When yield is reached, how does the runtime keep track of the point where things should be picked up? How is iterator state preserved?

Same way. The state of locals is moved onto the heap, and a number representing the instruction at which MoveNext should resume the next time it is called is stored along with the locals.

And again, there's a bunch of gear in an iterator block to make sure that exceptions are handled correctly.

like image 169
Eric Lippert Avatar answered Oct 13 '22 14:10

Eric Lippert


yield is the easier of the two, so let's examine it.

Say we have:

public IEnumerable<int> CountToTen() {   for (int i = 1; i <= 10; ++i)   {     yield return i;   } } 

This gets compiled a bit like if we'd written:

// Deliberately use name that isn't valid C# to not clash with anything private class <CountToTen> : IEnumerator<int>, IEnumerable<int> {     private int _i;     private int _current;     private int _state;     private int _initialThreadId = CurrentManagedThreadId;      public IEnumerator<CountToTen> GetEnumerator()     {         // Use self if never ran and same thread (so safe)         // otherwise create a new object.         if (_state != 0 || _initialThreadId != CurrentManagedThreadId)         {             return new <CountToTen>();         }          _state = 1;         return this;     }      IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();      public int Current => _current;      object IEnumerator.Current => Current;      public bool MoveNext()     {         switch(_state)         {             case 1:                 _i = 1;                 _current = i;                 _state = 2;                 return true;             case 2:                 ++_i;                 if (_i <= 10)                 {                     _current = _i;                     return true;                 }                 break;         }         _state = -1;         return false;     }      public void Dispose()     {       // if the yield-using method had a `using` it would       // be translated into something happening here.     }      public void Reset()     {         throw new NotSupportedException();     } } 

So, not as efficient as a hand-written implementation of IEnumerable<int> and IEnumerator<int> (e.g. we would likely not waste having a separate _state, _i and _current in this case) but not bad (the trick of re-using itself when safe to do so rather than creating a new object is good), and extensible to deal with very complicated yield-using methods.

And of course since

foreach(var a in b) {   DoSomething(a); } 

Is the same as:

using(var en = b.GetEnumerator()) {   while(en.MoveNext())   {      var a = en.Current;      DoSomething(a);   } } 

Then the generated MoveNext() is repeatedly called.

The async case is pretty much the same principle, but with a bit of extra complexity. To reuse an example from another answer Code like:

private async Task LoopAsync() {     int count = 0;     while(count < 5)     {        await SomeNetworkCallAsync();        count++;     } } 

Produces code like:

private struct LoopAsyncStateMachine : IAsyncStateMachine {   public int _state;   public AsyncTaskMethodBuilder _builder;   public TestAsync _this;   public int _count;   private TaskAwaiter _awaiter;   void IAsyncStateMachine.MoveNext()   {     try     {       if (_state != 0)       {         _count = 0;         goto afterSetup;       }       TaskAwaiter awaiter = _awaiter;       _awaiter = default(TaskAwaiter);       _state = -1;     loopBack:       awaiter.GetResult();       awaiter = default(TaskAwaiter);       _count++;     afterSetup:       if (_count < 5)       {         awaiter = _this.SomeNetworkCallAsync().GetAwaiter();         if (!awaiter.IsCompleted)         {           _state = 0;           _awaiter = awaiter;           _builder.AwaitUnsafeOnCompleted<TaskAwaiter, TestAsync.LoopAsyncStateMachine>(ref awaiter, ref this);           return;         }         goto loopBack;       }       _state = -2;       _builder.SetResult();     }     catch (Exception exception)     {       _state = -2;       _builder.SetException(exception);       return;     }   }   [DebuggerHidden]   void IAsyncStateMachine.SetStateMachine(IAsyncStateMachine param0)   {     _builder.SetStateMachine(param0);   } }  public Task LoopAsync() {   LoopAsyncStateMachine stateMachine = new LoopAsyncStateMachine();   stateMachine._this = this;   AsyncTaskMethodBuilder builder = AsyncTaskMethodBuilder.Create();   stateMachine._builder = builder;   stateMachine._state = -1;   builder.Start(ref stateMachine);   return builder.Task; } 

It's more complicated, but a very similar basic principle. The main extra complication is that now GetAwaiter() is being used. If any time awaiter.IsCompleted is checked it returns true because the task awaited is already completed (e.g. cases where it could return synchronously) then the method keeps moving through states, but otherwise it sets itself up as a callback to the awaiter.

Just what happens with that depends on the awaiter, in terms of what triggers the callback (e.g. async I/O completion, a task running on a thread completing) and what requirements there are for marshalling to a particular thread or running on a threadpool thread, what context from the original call may or may not be needed and so on. Whatever it is though something in that awaiter will call into the MoveNext and it will either continue with the next piece of work (up to the next await) or finish and return in which case the Task that it is implementing becomes completed.

like image 43
Jon Hanna Avatar answered Oct 13 '22 13:10

Jon Hanna