I wanted to test the overhead ascribed to a program by using await/async.
To test this, I wrote the following test class:
public class Entity : INotifyCompletion {
private Action continuation;
private int i;
public void OnCompleted(Action continuation) {
this.continuation = continuation;
}
public Entity GetAwaiter() {
return this;
}
public Entity GetResult() {
return this;
}
public bool IsCompleted { get { return true; } }
public void Execute() {
if (i > 0) Console.WriteLine("What");
}
}
And then I wrote a test harness. The test harness iterates through TestA and TestB 1600 times, measuring the latter 1500 times only (to allow the JIT to 'warm up'). set
is a collection of Entity objects (but the implementation is irrelevant). There are 50,000 entities in the set. The test harness uses the Stopwatch
class for testing.
private static void DoTestA() {
Entity[] objects = set.GetElements();
Parallel.For(0, objects.Length, async i => {
Entity e = objects[i];
if (e == null) return;
(await e).Execute();
});
}
private static void DoTestB() {
Entity[] objects = set.GetElements();
Parallel.For(0, objects.Length, i => {
Entity e = objects[i];
if (e == null) return;
e.Execute();
});
}
The two routines are identical, except one is awaiting the entity before calling Execute() (Execute()
does nothing useful, it's just some dumb code to make sure the processor is really doing something for each Entity).
After executing my test in release mode targeting AnyCPU, I get the following output:
>>> 1500 repetitions >>> IN NANOSECONDS (1000ns = 0.001ms)
Method Avg. Min. Max. Jitter Total
A 1,301,465ns 1,232,200ns 2,869,000ns 1,567,534ns ! 1952.199ms
B 130,053ns 116,000ns 711,200ns 581,146ns ! 195.081ms
As you can see, the method with the await in it is about 10 times slower.
The thing is, as far as I know, there is nothing 'to' await - GetResult
is always true. Does this mean that the state machine is executed even if the awaited 'thing' is already ready?
If so, is there any way around this? I'd like to use the semantics of async/await but this overhead is too high for my application...
EDIT: Adding full benchmark code after requested:
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Collections.Specialized;
using System.Diagnostics;
using System.Linq;
using System.Reflection;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace CSharpPerfTest {
public class Entity : INotifyCompletion {
private Action continuation;
private int i;
public void OnCompleted(Action continuation) {
this.continuation = continuation;
}
public Entity GetAwaiter() {
return this;
}
public Entity GetResult() {
return this;
}
public bool IsCompleted { get { return true; } }
public void Execute() {
if (i > 0) Console.WriteLine("What");
}
}
static class Program {
static ConcurrentSet<Entity> set;
const int MAX_ELEMENTS = 50000;
// Called once before all testing begins
private static void OnceBefore() {
set = new ConcurrentSet<Entity>();
Parallel.For(0, MAX_ELEMENTS, i => {
set.Add(new Entity());
});
}
// Called twice each repetition, once before DoTestA and once before DoTestB
private static void PreTest() {
}
private static void DoTestA() {
Entity[] objects = set.GetElements();
Parallel.For(0, objects.Length, async i => {
Entity e = objects[i];
if (e == null) return;
(await e).Execute();
});
}
private static void DoTestB() {
Entity[] objects = set.GetElements();
Parallel.For(0, objects.Length, i => {
Entity e = objects[i];
if (e == null) return;
e.Execute();
});
}
private const int REPETITIONS = 1500;
private const int JIT_WARMUPS = 10;
#region Test Harness
private static double[] aTimes = new double[REPETITIONS];
private static double[] bTimes = new double[REPETITIONS];
private static void Main(string[] args) {
Stopwatch stopwatch = new Stopwatch();
OnceBefore();
for (int i = JIT_WARMUPS * -1; i < REPETITIONS; ++i) {
Console.WriteLine("Starting repetition " + i);
PreTest();
stopwatch.Restart();
DoTestA();
stopwatch.Stop();
if (i >= 0) aTimes[i] = stopwatch.Elapsed.TotalMilliseconds;
PreTest();
stopwatch.Restart();
DoTestB();
stopwatch.Stop();
if (i >= 0) bTimes[i] = stopwatch.Elapsed.TotalMilliseconds;
}
DisplayScores();
}
private static void DisplayScores() {
Console.WriteLine();
Console.WriteLine();
bool inNanos = false;
if (aTimes.Average() < 10 || bTimes.Average() < 10) {
inNanos = true;
for (int i = 0; i < aTimes.Length; ++i) aTimes[i] *= 1000000;
for (int i = 0; i < bTimes.Length; ++i) bTimes[i] *= 1000000;
}
Console.WriteLine(">>> " + REPETITIONS + " repetitions >>> " + (inNanos ? "IN NANOSECONDS (1000ns = 0.001ms)" : "IN MILLISECONDS (1000ms = 1s)"));
Console.WriteLine("Method Avg. Min. Max. Jitter Total");
Console.WriteLine(
"A "
+ (String.Format("{0:N0}", (long) aTimes.Average()) + (inNanos ? "ns" : "ms")).PadRight(13, ' ')
+ (String.Format("{0:N0}", (long) aTimes.Min()) + (inNanos ? "ns" : "ms")).PadRight(13, ' ')
+ (String.Format("{0:N0}", (long) aTimes.Max()) + (inNanos ? "ns" : "ms")).PadRight(13, ' ')
+ (String.Format("{0:N0}", (long) Math.Max(aTimes.Average() - aTimes.Min(), aTimes.Max() - aTimes.Average())) + (inNanos ? "ns" : "ms")).PadRight(13, ' ')
+ ((long) aTimes.Sum() >= 10000 && inNanos ? "! " + String.Format("{0:f3}", aTimes.Sum() / 1000000) + "ms" : (long) aTimes.Sum() + (inNanos ? "ns" : "ms"))
);
Console.WriteLine(
"B "
+ (String.Format("{0:N0}", (long) bTimes.Average()) + (inNanos ? "ns" : "ms")).PadRight(13, ' ')
+ (String.Format("{0:N0}", (long) bTimes.Min()) + (inNanos ? "ns" : "ms")).PadRight(13, ' ')
+ (String.Format("{0:N0}", (long) bTimes.Max()) + (inNanos ? "ns" : "ms")).PadRight(13, ' ')
+ (String.Format("{0:N0}", (long) Math.Max(bTimes.Average() - bTimes.Min(), bTimes.Max() - bTimes.Average())) + (inNanos ? "ns" : "ms")).PadRight(13, ' ')
+ ((long) bTimes.Sum() >= 10000 && inNanos ? "! " + String.Format("{0:f3}", bTimes.Sum() / 1000000) + "ms" : (long) bTimes.Sum() + (inNanos ? "ns" : "ms"))
);
Console.ReadKey();
}
#endregion
}
}
I found out that running async-await can be much slower in some scenarios. But if I click on the 'both' button, the 'await' version is ~3-4 times slower than the promises version.
Don't ever use await with forEach . Use a for-loop (or any loop without a callback) instead. Don't await inside filter and reduce . Always await an array of promises with map , then filter or reduce accordingly.
The main benefits of asynchronous programming using async / await include the following: Increase the performance and responsiveness of your application, particularly when you have long-running operations that do not require to block the execution.
Because await is only valid inside async functions and modules, which themselves are asynchronous and return promises, the await expression never blocks the main thread and only defers execution of code that actually depends on the result, i.e. anything after the await expression.
If your function has a response time that 1ms for 50,000 calls is considered significant you should not be awaiting that code and instead be running it synchronously.
Using async code has a small overhead, it has to add a function calls for the state machine that drives it internally. If the work you are making async is also small in comparison to the overhead costs of running the state machine you should make the code you need to rethink if your code should be async.
Converted to an answer from the comments: apparently, this is not a clean benchmark test.
If you do not need asynchronous continuation, just don't use it. The code is always faster without it. If you do need it, then expect some overhead. You should understand what's going on behind the scene when you use a particular language/runtime feature.
Even if you remove Parallel.For
, convert lambdas into methods and prevent inlining, there still will be some struct
copying and await
continuation closure allocation, to support the state machine functionality (an example of the generated code).
A more fair benchmark would be to test async/await
vs. alternative implementation using callback closures and Task.ContinueWith
, on a thread without synchronization context. I would not expect any significant difference in this case.
On a side note, you're passing an async void
lambda Action
into Parallel.For
. You should be aware that the execution control will return to Parallel.For
as soon as there's the 1st await
inside the lambda, and then it's essentially a fire-and-forget call outside Parallel.For
. I really can't think of any useful scenarios for this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With