Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pattern for writing synchronous and asynchronous methods in libraries and keeping it DRY [duplicate]

I'm modifying a library to add async methods. From Should I expose synchronous wrappers for asynchronous methods? it states I shouldn't just write a wrapper around Task.Result when calling the synchronous method. But how do I not have to duplicate a lot of code between async methods and sync methods, as we want to keep both options in the library?

For example the library currently uses TextReader.Read method. Part of the asynchronous change we would like to use TextReader.ReadAsync method. Since this is at the core of the libraries it would seem I would need to duplicate a lot of code between the synchronous and asynchronous methods (want to keep the code DRY as possible). Or I need to refactor them out in a PreRead and PostRead methods which seems to clutter the code and what the TPL was trying to fix.

I'm thinking about just wrapping the TextReader.Read method in a Task.Return(). Even with it being a task the improvements from the TPL should not have it switch to a different thread and I can still use the async await for the most of the code like normal. Would it then be ok to have a wrapper of the synchronous to be just Task.Result or Wait()?

I looked at other examples in the .net library. The StreamReader seems to duplicate the code between the async and non async. The MemoryStream does a Task.FromResult.

Also planning everywhere I could adding ConfigureAwait(false) as it's just a library.

Update:

What I'm talking about duplicated code is

 public decimal ReadDecimal()
 {
     do
     {
          if (!Read())
          {
               SetInternalProperies()
          }
          else
          {
               return _reader.AsDecimal();
          }
      } while (_reader.hasValue)
 }

 public async Task<decimal> ReadDecimalAsync()
 {
     do
     {
          if (!await ReadAsync())
          {
               SetInternalProperies()
          }
          else
          {
               return _reader.AsDecimal();
          }
      } while (_reader.hasValue)
  }

This is a small example but you can see the only code change is the awaiting and task.

To make it clear I want to code using async/await and TPL everywhere in the library but I still need to also have the old sync methods work as well. I'm not about to just Task.FromResult() the sync methods. What I was thinking was having a flag that says I want the sync method and at the root check the flag something like

 public decimal ReadDecimal()
 { 
     return ReadDecimalAsyncInternal(true).Result;
 }

 public async Task<decimal> ReadDecimal()
 {
     return await ReadDecimalAsyncInternal(false);
 }

 private async Task<decimal> ReadDecimalAsyncInternal(bool syncRequest)
 {
     do
     {
          if (!await ReadAsync(syncRequest))
          {
               SetInternalProperies()
          }
          else
          {
               return _reader.AsDecimal();
          }
      } while (_reader.hasValue)
}

private Task<bool> ReadAsync(bool syncRequest)
{
    if(syncRequest)
    {
        return Task.FromResult(streamReader.Read())
    }
    else
    {
        return StreamReader.ReadAsync(); 
    }
}
like image 347
CharlesNRice Avatar asked Jan 15 '15 19:01

CharlesNRice


2 Answers

You want to add async methods in addition to the synchronous ones in your lib. The article you linked to talks exactly about that. It recommend to create specialized code for both versions.

Now that advice is usually given because:

  1. Async methods should be low-latency. For efficiency they should use async IO internally.
  2. Sync methods should use sync IO internally for efficiency reasons.

If you create wrappers you might mislead callers.

Now, it is a valid strategy to create wrappers both ways if you are OK with the consequences. It certainly saves a lot of code. But you will have to decide whether to give preference to the sync or to the async version. The other one will be less efficient and have no performance-based reason to exist.

You will rarely find this in the BCL because the quality of implementation is high. But for example ADO.NET 4.5's SqlConnection class uses sync-over-async. The cost of doing a SQL call is far greater than the sync overhead. That's an OK use case. MemoryStream uses (kind of) async-over-sync because it is inherently CPU-only work but it must implement Stream.

What's the overhead actually? Expect to be able to run >100 million Task.FromResult per second and millions of almost zero work Task.Run per second. That's small overhead compared to many things.


Please see the comments below for an interesting discussion. To preserve that content I am copying some comments into this answer. In copying I tried to leave out subjective remarks as much as possible since this answer is meant to be objectively true. The full discussion is below.

It is possible to reliably avoid deadlocks. For example, ADO.NET uses sync-over-async in recent versions. You can see this when pausing the debugger while queries are running and looking at the call stack. It is common wisdom that sync-over-async is problematic and it's true. But it is false that you categorically cannot use it. It is a trade-off.

The following pattern is always safe (just an example): Task.Run(() => Async()).Wait();. This is safe because Async is called without synchronization context. The deadlock potential normally comes from an async method capturing the sync context, which is single threaded, and then wanting to re-enter it. An alternative is to consistently use ConfigureAwait(false) which is error prone (one error deadlocks your production app at 4:00 in the morning). Another alternative is SetSyncContext(null); var task = Async(); SetSyncContext(previous);.

I also like the idea with the boolean flag. It is another possible trade-off. Most application do not care about optimizing performance in these small ways. They want correctness and developer productivity. Async is bad for both in many cases.

If you want an async method to be callable in an arbitrary way than it must use ConfigureAwait(false) which is recommended for library code anyway. Then, you can just use Wait() without danger. I also want to point out that async IO does not change the speed of the actual work (DB, web service) in any way. It also adds CPU call overhead (more, not less overhead). Any perf gains can only come from increasing parallelism. Sync code can do parallelism also. Async is only superior if the parallelism is so high that threading cannot reasonably be used (hundreds).

There are some additional ways that async can increase performance but those are quite minor and occur specialized circumstances. In general, you will find normal synchronous calls to be faster. I know that because I tried it and also from theoretical observations.

Saving threads is pointless when there is not a shortage of threads. Most (not all) server apps are not short on threads in any way. A thread is just 1MB of memory and 1ms of CPU overhead. Usually there are ample threads available to handle incoming requests and other work. We have programmed our applications with sync IO for the last 20 years and it was totally fine.

I want to clarify that sync over async usually has more overhead because it combines the overhead from async with the overhead of waiting on a Task. A purely synchronous call chain uses less CPU than a purely async call chain in almost all cases, though. But then again these little performance differences do not matter in almost all cases. So we should optimize for developer productivity.

Great cases for async are IOs which are long running and frequent. Also, large degrees of parallelism (for example, if you want to connect to 1 million chat clients over TCP, or you are querying a web service with 100 parallel connections). Here, async IO has meaningful performance and reliability gains. Task + await is an awesome way to implement this. await plus async IO also is very nice in client GUI apps. I do not want to create the impression that I'm against async categorically.

But you can also flexibly transition out of async. E.g. Task.WaitAll(arrayWith100IOTasks) would burn just one thread waiting for 100 parallel IOs. That way you avoid infecting the entire call stack and you save 99 threads. In GUI apps you can often do await Task.Run(() => LotsOfCodeUsingSyncIO()). Again, just one place infected with async and you have nice code.

like image 118
usr Avatar answered Oct 13 '22 03:10

usr


Would it then be ok to have a wrapper of the synchronous to be just Task.Result or Wait()?

You have to understand what async IO is all about. Its not about code duplication, its about taking advantage of the fact that you don't need any threads when the work is naturally asynchronous.

If you'd wrap your synchronous code with a task, you're missing out on that advantage. Also, you'd be misleading your API callers when they'd assume the awaited call would yield control back to the caller.

Edit:

Your examples strengthens my point. Don't from the use of tasks. Synchronous apis by themselves are completely fine, dont force the use of TPL into them when it isn't needed, eveb if it causes your codebase to grow 2x the amount of lines.

Take the time to implement your async api correctly. Don't block on async code, keep it flowing all the way to the bottom of the stack.

like image 1
Yuval Itzchakov Avatar answered Oct 13 '22 01:10

Yuval Itzchakov