Tests show "await" is significantly slower, even when object being awaited is already Complete

Tags:

I wanted to test the overhead ascribed to a program by using await/async.

To test this, I wrote the following test class:

public class Entity : INotifyCompletion {
    private Action continuation;
    private int i;

    public void OnCompleted(Action continuation) {
        this.continuation = continuation;
    }

    public Entity GetAwaiter() {
        return this;
    }

    public Entity GetResult() {
        return this;
    }

    public bool IsCompleted { get { return true; } }

    public void Execute() {
        if (i > 0) Console.WriteLine("What");
    }
}

And then I wrote a test harness. The test harness iterates through TestA and TestB 1600 times, measuring the latter 1500 times only (to allow the JIT to 'warm up'). set is a collection of Entity objects (but the implementation is irrelevant). There are 50,000 entities in the set. The test harness uses the Stopwatch class for testing.

private static void DoTestA() {
    Entity[] objects = set.GetElements();
    Parallel.For(0, objects.Length, async i => {
        Entity e = objects[i];
        if (e == null) return;

        (await e).Execute();
    });
}

private static void DoTestB() {
    Entity[] objects = set.GetElements();
    Parallel.For(0, objects.Length, i => {
        Entity e = objects[i];
        if (e == null) return;

        e.Execute();
    });
}

The two routines are identical, except one is awaiting the entity before calling Execute() (Execute() does nothing useful, it's just some dumb code to make sure the processor is really doing something for each Entity).

After executing my test in release mode targeting AnyCPU, I get the following output:

>>> 1500 repetitions >>> IN NANOSECONDS (1000ns = 0.001ms)
Method   Avg.         Min.         Max.         Jitter       Total
A        1,301,465ns  1,232,200ns  2,869,000ns  1,567,534ns  ! 1952.199ms
B        130,053ns    116,000ns    711,200ns    581,146ns    ! 195.081ms

As you can see, the method with the await in it is about 10 times slower.

The thing is, as far as I know, there is nothing 'to' await - GetResult is always true. Does this mean that the state machine is executed even if the awaited 'thing' is already ready?

If so, is there any way around this? I'd like to use the semantics of async/await but this overhead is too high for my application...

EDIT: Adding full benchmark code after requested:

Program.cs

using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Collections.Specialized;
using System.Diagnostics;
using System.Linq;
using System.Reflection;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

namespace CSharpPerfTest {
    public class Entity : INotifyCompletion {
        private Action continuation;
        private int i;

        public void OnCompleted(Action continuation) {
            this.continuation = continuation;
        }

        public Entity GetAwaiter() {
            return this;
        }

        public Entity GetResult() {
            return this;
        }

        public bool IsCompleted { get { return true; } }

        public void Execute() {
            if (i > 0) Console.WriteLine("What");
        }
    }

    static class Program {
        static ConcurrentSet<Entity> set;
        const int MAX_ELEMENTS = 50000;

        // Called once before all testing begins
        private static void OnceBefore() {
            set = new ConcurrentSet<Entity>();

            Parallel.For(0, MAX_ELEMENTS, i => {
                set.Add(new Entity());
            });
        }

        // Called twice each repetition, once before DoTestA and once before DoTestB
        private static void PreTest() {

        }

        private static void DoTestA() {
            Entity[] objects = set.GetElements();
            Parallel.For(0, objects.Length, async i => {
                Entity e = objects[i];
                if (e == null) return;
                (await e).Execute();
            });
        }

        private static void DoTestB() {
            Entity[] objects = set.GetElements();
            Parallel.For(0, objects.Length, i => {
                Entity e = objects[i];
                if (e == null) return;
                e.Execute();
            });
        }

        private const int REPETITIONS = 1500;
        private const int JIT_WARMUPS = 10;

        #region Test Harness
        private static double[] aTimes = new double[REPETITIONS];
        private static double[] bTimes = new double[REPETITIONS];

        private static void Main(string[] args) {
            Stopwatch stopwatch = new Stopwatch();

            OnceBefore();

            for (int i = JIT_WARMUPS * -1; i < REPETITIONS; ++i) {
                Console.WriteLine("Starting repetition " + i);

                PreTest();
                stopwatch.Restart();
                DoTestA();
                stopwatch.Stop();
                if (i >= 0) aTimes[i] = stopwatch.Elapsed.TotalMilliseconds;

                PreTest();
                stopwatch.Restart();
                DoTestB();
                stopwatch.Stop();
                if (i >= 0) bTimes[i] = stopwatch.Elapsed.TotalMilliseconds;
            }

            DisplayScores();
        }

        private static void DisplayScores() {
            Console.WriteLine();
            Console.WriteLine();

            bool inNanos = false;
            if (aTimes.Average() < 10 || bTimes.Average() < 10) {
                inNanos = true;
                for (int i = 0; i < aTimes.Length; ++i) aTimes[i] *= 1000000;
                for (int i = 0; i < bTimes.Length; ++i) bTimes[i] *= 1000000;
            }

            Console.WriteLine(">>> " + REPETITIONS + " repetitions >>> " + (inNanos ? "IN NANOSECONDS (1000ns = 0.001ms)" : "IN MILLISECONDS (1000ms = 1s)"));
            Console.WriteLine("Method   Avg.         Min.         Max.         Jitter       Total");

            Console.WriteLine(
            "A        "
            + (String.Format("{0:N0}", (long) aTimes.Average()) + (inNanos ? "ns" : "ms")).PadRight(13, ' ')
            + (String.Format("{0:N0}", (long) aTimes.Min()) + (inNanos ? "ns" : "ms")).PadRight(13, ' ')
            + (String.Format("{0:N0}", (long) aTimes.Max()) + (inNanos ? "ns" : "ms")).PadRight(13, ' ')
            + (String.Format("{0:N0}", (long) Math.Max(aTimes.Average() - aTimes.Min(), aTimes.Max() - aTimes.Average())) + (inNanos ? "ns" : "ms")).PadRight(13, ' ')
            + ((long) aTimes.Sum() >= 10000 && inNanos ? "! " + String.Format("{0:f3}", aTimes.Sum() / 1000000) + "ms" : (long) aTimes.Sum() + (inNanos ? "ns" : "ms"))
            );
            Console.WriteLine(
            "B        "
            + (String.Format("{0:N0}", (long) bTimes.Average()) + (inNanos ? "ns" : "ms")).PadRight(13, ' ')
            + (String.Format("{0:N0}", (long) bTimes.Min()) + (inNanos ? "ns" : "ms")).PadRight(13, ' ')
            + (String.Format("{0:N0}", (long) bTimes.Max()) + (inNanos ? "ns" : "ms")).PadRight(13, ' ')
            + (String.Format("{0:N0}", (long) Math.Max(bTimes.Average() - bTimes.Min(), bTimes.Max() - bTimes.Average())) + (inNanos ? "ns" : "ms")).PadRight(13, ' ')
            + ((long) bTimes.Sum() >= 10000 && inNanos ? "! " + String.Format("{0:f3}", bTimes.Sum() / 1000000) + "ms" : (long) bTimes.Sum() + (inNanos ? "ns" : "ms"))
            );

            Console.ReadKey();
        }
        #endregion

    }
}

260

asked Apr 04 '14 18:04

Xenoprimate

2 Answers

If your function has a response time that 1ms for 50,000 calls is considered significant you should not be awaiting that code and instead be running it synchronously.

Using async code has a small overhead, it has to add a function calls for the state machine that drives it internally. If the work you are making async is also small in comparison to the overhead costs of running the state machine you should make the code you need to rethink if your code should be async.

176

answered Nov 14 '22 21:11

Scott Chamberlain

Converted to an answer from the comments: apparently, this is not a clean benchmark test.

If you do not need asynchronous continuation, just don't use it. The code is always faster without it. If you do need it, then expect some overhead. You should understand what's going on behind the scene when you use a particular language/runtime feature.

Even if you remove Parallel.For, convert lambdas into methods and prevent inlining, there still will be some struct copying and await continuation closure allocation, to support the state machine functionality (an example of the generated code).

A more fair benchmark would be to test async/await vs. alternative implementation using callback closures and Task.ContinueWith, on a thread without synchronization context. I would not expect any significant difference in this case.

On a side note, you're passing an async void lambda Action into Parallel.For. You should be aware that the execution control will return to Parallel.For as soon as there's the 1st await inside the lambda, and then it's essentially a fire-and-forget call outside Parallel.For. I really can't think of any useful scenarios for this.

answered Nov 14 '22 20:11

noseratio

Related questions
                            
                                Proper approach to dealing with Auto-scaling
                            
                                Why doesn't C#'s overload resolution work between Func<T,T> and Action<T>? [duplicate]
                            
                                How does the C# compiler decide to emit retargetable assembly references?
                            
                                how to get started in writing this Orchard module
                            
                                Why C# LINQ expressions must end with Select or Group By Clause where as no such restriction in VB.Net
                            
                                How to add a C++ DLL in Windows Phone 8 (C#) Framework
                            
                                Trying to solve telephone word more elegantly with recursion
                            
                                EF, Code First - How to set a custom Guid identity value on insert
                            
                                Delay event handling until events have been fired
                            
                                What are the pros and cons of writing C#/Xaml vs. HTML/JavaScript WinRT applications in Windows8 [closed]
                            
                                The CA2104 warning: Is there any way to mark a class as `Immutable` to suppress it?
                            
                                Loop implementation of List.Contains() appears faster than the built-in one. Is it? If so, why?
                            
                                Handling Layout properties with custom Razor view engine
                            
                                How to traverse a dacpac
                            
                                Windows 7 Environment Variable for System32 or SysWOW64
                            
                                DataTemplate key is ignored when specified with x:Static
                            
                                Web API 2 routing attributes not working
                            
                                Prevent LocalReport.Render using Calibri Font Ligatures
                            
                                How to stop the validation trigger to start automatically in wpf
                            
                                Generic method where T implements Interface<T>

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Tests show "await" is significantly slower, even when object being awaited is already Complete

Tags:

performance

c#

.net

asynchronous

async-await