As I understand the yield keyword, if used from inside an iterator block, it returns flow of control to the calling code, and when the iterator is called again, it picks up where it left off.
Also, await not only waits for the callee, but it returns control to the caller, only to pick up where it left off when the caller awaits the method.
In other words-- there is no thread, and the "concurrency" of async and await is an illusion caused by clever flow of control, the details of which are concealed by the syntax.
Now, I'm a former assembly programmer and I'm very familiar with instruction pointers, stacks, etc. and I get how normal flows of control (subroutine, recursion, loops, branches) work. But these new constructs-- I don't get them.
When an await is reached, how does the runtime know what piece of code should execute next?  How does it know when it can resume where it left off, and how does it remember where?  What happens to the current call stack, does it get saved somehow?  What if the calling method makes other method calls before it awaits-- why doesn't the stack get overwritten?  And how on earth would the runtime work its way through all this in the case of an exception and a stack unwind?
When yield is reached, how does the runtime keep track of the point where things should be picked up?  How is iterator state preserved?
I'll answer your specific questions below, but you would likely do well to simply read my extensive articles on how we designed yield and await.
https://blogs.msdn.microsoft.com/ericlippert/tag/continuation-passing-style/
https://blogs.msdn.microsoft.com/ericlippert/tag/iterators/
https://blogs.msdn.microsoft.com/ericlippert/tag/async/
Some of these articles are out of date now; the code generated is different in a lot of ways. But these will certainly give you the idea of how it works.
Also, if you do not understand how lambdas are generated as closure classes, understand that first. You won't make heads or tails of async if you don't have lambdas down.
When an await is reached, how does the runtime know what piece of code should execute next?
await is generated as:
if (the task is not completed)
  assign a delegate which executes the remainder of the method as the continuation of the task
  return to the caller
else
  execute the remainder of the method now
That's basically it. Await is just a fancy return.
How does it know when it can resume where it left off, and how does it remember where?
Well, how do you do that without await? When method foo calls method bar, somehow we remember how to get back to the middle of foo, with all the locals of the activation of foo intact, no matter what bar does.
You know how that's done in assembler. An activation record for foo is pushed onto the stack; it contains the values of the locals. At the point of the call the return address in foo is pushed onto the stack. When bar is done, the stack pointer and instruction pointer are reset to where they need to be and foo keeps going from where it left off.
The continuation of an await is exactly the same, except that the record is put onto the heap for the obvious reason that the sequence of activations does not form a stack.
The delegate which await gives as the continuation to the task contains (1) a number which is the input to a lookup table that gives the instruction pointer that you need to execute next, and (2) all the values of locals and temporaries.
There is some additional gear in there; for instance, in .NET it is illegal to branch into the middle of a try block, so you can't simply stick the address of code inside a try block into the table. But these are bookkeeping details. Conceptually, the activation record is simply moved onto the heap.
What happens to the current call stack, does it get saved somehow?
The relevant information in the current activation record is never put on the stack in the first place; it is allocated off the heap from the get-go. (Well, formal parameters are passed on the stack or in registers normally and then copied into a heap location when the method begins.)
The activation records of the callers are not stored; the await is probably going to return to them, remember, so they'll be dealt with normally.
Note that this is a germane difference between the simplified continuation passing style of await, and true call-with-current-continuation structures that you see in languages like Scheme. In those languages the entire continuation including the continuation back into the callers is captured by call-cc.
What if the calling method makes other method calls before it awaits-- why doesn't the stack get overwritten?
Those method calls return, and so their activation records are no longer on the stack at the point of the await.
And how on earth would the runtime work its way through all this in the case of an exception and a stack unwind?
In the event of an uncaught exception, the exception is caught, stored inside the task, and re-thrown when the task's result is fetched.
Remember all that bookkeeping I mentioned before? Getting exception semantics right was a huge pain, let me tell you.
When yield is reached, how does the runtime keep track of the point where things should be picked up? How is iterator state preserved?
Same way. The state of locals is moved onto the heap, and a number representing the instruction at which MoveNext should resume the next time it is called is stored along with the locals. 
And again, there's a bunch of gear in an iterator block to make sure that exceptions are handled correctly.
yield is the easier of the two, so let's examine it.
Say we have:
public IEnumerable<int> CountToTen()
{
  for (int i = 1; i <= 10; ++i)
  {
    yield return i;
  }
}
This gets compiled a bit like if we'd written:
// Deliberately use name that isn't valid C# to not clash with anything
private class <CountToTen> : IEnumerator<int>, IEnumerable<int>
{
    private int _i;
    private int _current;
    private int _state;
    private int _initialThreadId = CurrentManagedThreadId;
    public IEnumerator<CountToTen> GetEnumerator()
    {
        // Use self if never ran and same thread (so safe)
        // otherwise create a new object.
        if (_state != 0 || _initialThreadId != CurrentManagedThreadId)
        {
            return new <CountToTen>();
        }
        _state = 1;
        return this;
    }
    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
    public int Current => _current;
    object IEnumerator.Current => Current;
    public bool MoveNext()
    {
        switch(_state)
        {
            case 1:
                _i = 1;
                _current = i;
                _state = 2;
                return true;
            case 2:
                ++_i;
                if (_i <= 10)
                {
                    _current = _i;
                    return true;
                }
                break;
        }
        _state = -1;
        return false;
    }
    public void Dispose()
    {
      // if the yield-using method had a `using` it would
      // be translated into something happening here.
    }
    public void Reset()
    {
        throw new NotSupportedException();
    }
}
So, not as efficient as a hand-written implementation of IEnumerable<int> and IEnumerator<int> (e.g. we would likely not waste having a separate _state, _i and _current in this case) but not bad (the trick of re-using itself when safe to do so rather than creating a new object is good), and extensible to deal with very complicated yield-using methods.
And of course since
foreach(var a in b)
{
  DoSomething(a);
}
Is the same as:
using(var en = b.GetEnumerator())
{
  while(en.MoveNext())
  {
     var a = en.Current;
     DoSomething(a);
  }
}
Then the generated MoveNext() is repeatedly called.
The async case is pretty much the same principle, but with a bit of extra complexity. To reuse an example from another answer Code like:
private async Task LoopAsync()
{
    int count = 0;
    while(count < 5)
    {
       await SomeNetworkCallAsync();
       count++;
    }
}
Produces code like:
private struct LoopAsyncStateMachine : IAsyncStateMachine
{
  public int _state;
  public AsyncTaskMethodBuilder _builder;
  public TestAsync _this;
  public int _count;
  private TaskAwaiter _awaiter;
  void IAsyncStateMachine.MoveNext()
  {
    try
    {
      if (_state != 0)
      {
        _count = 0;
        goto afterSetup;
      }
      TaskAwaiter awaiter = _awaiter;
      _awaiter = default(TaskAwaiter);
      _state = -1;
    loopBack:
      awaiter.GetResult();
      awaiter = default(TaskAwaiter);
      _count++;
    afterSetup:
      if (_count < 5)
      {
        awaiter = _this.SomeNetworkCallAsync().GetAwaiter();
        if (!awaiter.IsCompleted)
        {
          _state = 0;
          _awaiter = awaiter;
          _builder.AwaitUnsafeOnCompleted<TaskAwaiter, TestAsync.LoopAsyncStateMachine>(ref awaiter, ref this);
          return;
        }
        goto loopBack;
      }
      _state = -2;
      _builder.SetResult();
    }
    catch (Exception exception)
    {
      _state = -2;
      _builder.SetException(exception);
      return;
    }
  }
  [DebuggerHidden]
  void IAsyncStateMachine.SetStateMachine(IAsyncStateMachine param0)
  {
    _builder.SetStateMachine(param0);
  }
}
public Task LoopAsync()
{
  LoopAsyncStateMachine stateMachine = new LoopAsyncStateMachine();
  stateMachine._this = this;
  AsyncTaskMethodBuilder builder = AsyncTaskMethodBuilder.Create();
  stateMachine._builder = builder;
  stateMachine._state = -1;
  builder.Start(ref stateMachine);
  return builder.Task;
}
It's more complicated, but a very similar basic principle. The main extra complication is that now GetAwaiter() is being used. If any time awaiter.IsCompleted is checked it returns true because the task awaited is already completed (e.g. cases where it could return synchronously) then the method keeps moving through states, but otherwise it sets itself up as a callback to the awaiter.
Just what happens with that depends on the awaiter, in terms of what triggers the callback (e.g. async I/O completion, a task running on a thread completing) and what requirements there are for marshalling to a particular thread or running on a threadpool thread, what context from the original call may or may not be needed and so on. Whatever it is though something in that awaiter will call into the MoveNext and it will either continue with the next piece of work (up to the next await) or finish and return in which case the Task that it is implementing becomes completed.
There are a ton of great answers here already; I'm just going to share a few viewpoints that can help form a mental model.
First, an async method is broken into several pieces by the compiler; the await expressions are the fracture points. (This is easy to conceive for simple methods; more complex methods with loops and exception handling also get broken up, with the addition of a more complex state machine).
Second, await is translated into a fairly simple sequence; I like Lucian's description, which in words is pretty much "if the awaitable is already complete, get the result and continue executing this method; otherwise, save this method's state and return". (I use very similar terminology in my async intro).
When an await is reached, how does the runtime know what piece of code should execute next?
The remainder of the method exists as a callback for that awaitable (in the case of tasks, these callbacks are continuations). When the awaitable completes, it invokes its callbacks.
Note that the call stack is not saved and restored; callbacks are invoked directly. In the case of overlapped I/O, they're invoked directly from the thread pool.
Those callbacks may continue executing the method directly, or they may schedule it to run elsewhere (e.g., if the await captured a UI SynchronizationContext and the I/O completed on the thread pool).
How does it know when it can resume where it left off, and how does it remember where?
It's all just callbacks. When an awaitable completes, it invokes its callbacks, and any async method that had already awaited it gets resumed. The callback jumps into the middle of that method and has its local variables in scope.
The callbacks are not run a particular thread, and they do not have their callstack restored.
What happens to the current call stack, does it get saved somehow? What if the calling method makes other method calls before it awaits-- why doesn't the stack get overwritten? And how on earth would the runtime work its way through all this in the case of an exception and a stack unwind?
The callstack isn't saved in the first place; it isn't necessary.
With synchronous code, you can end up with a call stack that includes all your callers, and the runtime knows where to return using that.
With asynchronous code, you can end up with a bunch of callback pointers - rooted at some I/O operation that finishes its task, which can resume an async method that finishes its task, which can resume an async method that finishes its task, etc.
So, with synchronous code A calling B calling C, your callstack may look like this:
A:B:C
whereas the asynchronous code uses callbacks (pointers):
A <- B <- C <- (I/O operation)
When yield is reached, how does the runtime keep track of the point where things should be picked up? How is iterator state preserved?
Currently, rather inefficiently. :)
It works like any other lambda - variable lifetimes are extended and references are placed into a state object that lives on the stack. The best resource for all the deep-level details is Jon Skeet's EduAsync series.
yield and await are, while both dealing with flow control, two completely different things. So I'll tackle them separately.
The goal of yield is to make it easier to build lazy sequences. When you write an enumerator-loop with a yield statement in it, the compiler generates a ton of new code you don't see. Under the hood, it actually generates a whole new class. The class contains members that track the state of the loop, and an implementation of IEnumerable so that each time you call MoveNext it steps once more through that loop. So when you do a foreach loop like this:
foreach(var item in mything.items()) {
    dosomething(item);
}
the generated code looks something like:
var i = mything.items();
while(i.MoveNext()) {
    dosomething(i.Current);
}
Inside the implementation of mything.items() is a bunch of state-machine code that will do one "step" of the loop then return. So while you write it in the source like a simple loop, under the hood it's not a simple loop. So compiler trickery. If you want to see yourself, pull out ILDASM or ILSpy or similar tools and see what the generated IL looks like. It should be instructive.
async and await, on the other hand, are a whole other kettle of fish. Await is, in the abstract, a synchronization primitive. It's a way to tell the system "I can't continue until this is done." But, as you noted, there's not always a thread involved.
What is involved is something called a synchronization context. There's always one hanging around. The job of they synchronization context is to schedule tasks that are being awaited on and their continuations.
When you say await thisThing(), a couple things happen. In an async method, the compiler actually chops up the method into smaller chunks, each chunk being a "before an await" section and an "after an await" (or continuation) section. When the await executes, the task being awaited, and the following continuation - in other words, the rest of the function - is passed to the synchronization context. The context takes care of scheduling the task, and when it's finished the context then runs the continuation, passing whatever return value it wants.
The sync context is free to do whatever it wants as long as it schedules stuff. It could use the thread pool. It could create a thread per task. It could run them synchronously. Different environments (ASP.NET vs. WPF) provide different sync context implementations that do different things based on what's best for their environments.
(Bonus: ever wondered what .ConfigurateAwait(false) does? It's telling the system not to use the current sync context (usually based on your project type - WPF vs ASP.NET for example) and instead use the default one, which uses the thread pool).
So again, it's a lot of compiler trickery. If you look at the generated code its complicated but you should be able to see what it's doing. These kinds of transformations are hard, but deterministic and mathematical, which is why it's great that the compiler is doing them for us.
P.S. There is one exception to the existence of default sync contexts - console apps don't have a default sync context. Check Stephen Toub's blog for lots more information. It's a great place to look for information on async and await in general. 
Normally, I'd recomment looking at the CIL, but in the case of these, it's a mess.
These two language constructs are similar in working, but implemented a bit differently. Basically, it's just a syntactic sugar for a compiler magic, there's nothing crazy/unsafe at the assembly level. Let's look at them briefly.
yield is an older and simpler statement, and it's a syntactic sugar for a basic state machine. A method returning IEnumerable<T> or IEnumerator<T> may contain a yield, which then transforms the method to a state machine factory. One thing you should notice is that no code in the method is run at the moment when you call it, if there is a yield inside. The reason is that the code you write is translocated to the IEnumerator<T>.MoveNext method, which checks the state it's in and runs the correct portion of the code. yield return x; is then converted to something akin to this.Current = x; return true;
If you do some reflection, you can easily inspect the constructed state machine and its fields (at least one for the state and for the locals). You can even reset it if you change the fields.
await requires a bit of a support from the type library, and works somewhat differently. It takes a Task or Task<T> argument, then either results to its value if the task is completed, or registers a continuation via Task.GetAwaiter().OnCompleted. The full implementation of the async/await system would take too long to explain, but it's also not that mystical. It also creates a state machine and passes it along the continuation to OnCompleted. If the task is completed, it then uses its result in the continuation. The implementation of the awaiter decides how to invoke the continuation. Typically it uses the calling thread's synchronisation context.
Both yield and await have to split the method up based on their occurance to form a state machine, with each branch of the machine representing each part of the method.
You shouldn't think about these concepts in the "lower level" terms like stacks, threads etc. These are abstractions, and their inner workings doesn't require any support from the CLR, it's just the compiler that does the magic. This is wildly different from Lua's coroutines, which do have the runtime's support, or C's longjmp, which is just black magic.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With