Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

System.Runtime.CompilerServices.AsyncTaskMethodBuilder::Start has significant self time

I'm profiling our C# .NET application, and I'm noticing that the method the method System.Runtime.CompilerServices.AsyncTaskMethodBuilder::Start is appearing multiple times, taking up about 3-4 seconds of Self Time of my 1 minute sample (meaning that it's spending about 3-4 seconds in the task infrastructure).

I understand that this method is used by the compiler to implement the async/await language construct in C#. What, generally, is in it that would cause it to block or otherwise take up a lot of time? Is there any way to improve our method so that it spends less time in this infrastructure?

EDIT: Here's a somewhat lengthy, but still self-contained, code sample demonstrating the problem, essentially doing a parallel merge sort on two very large arrays:

using System;
using System.Collections.Generic;
using System.Threading.Tasks;

namespace AsyncAwaitSelfTimeTest
{
    class Program
    {
        static void Main(string[] args)
        {
            Random random = new Random();

            int[] arrayOne = GenerateArray(50_000_000, random.Next);
            double[] arrayTwo = GenerateArray(50_000_000, random.NextDouble);

            Comparer<int> comparerOne = Comparer<int>.Create((a, b) =>
            {
                if (a < b) return -1;
                else if (a > b) return 1;
                else return 0;
            });
            Comparer<double> comparerTwo = Comparer<double>.Create((a, b) =>
            {
                if (a < b) return -1;
                else if (a > b) return 1;
                else return 0;
            });

            var sortTaskOne = Task.Run(() => MergeSort(arrayOne, 0, arrayOne.Length, comparerOne));
            var sortTaskTwo = Task.Run(() => MergeSort(arrayTwo, 0, arrayTwo.Length, comparerTwo));

            Task.WaitAll(sortTaskOne, sortTaskTwo);
            Console.Write("done sorting");
        }

        static T[] GenerateArray<T>(int length, Func<T> getFunc)
        {
            T[] result = new T[length];
            for (int i = 0; i < length; i++)
            {
                result[i] = getFunc();
            }
            return result;
        }

        static async Task MergeSort<T>(T[] array, int start, int end, Comparer<T> comparer)
        {
            if (end - start <= 16)
            {
                SelectionSort(array, start, end, comparer);
            }
            else
            {
                int mid = start + (end - start) / 2;

                Task firstTask = Task.Run(() => MergeSort(array, start, mid, comparer));
                Task secondTask = Task.Run(() => MergeSort(array, mid, end, comparer));

                await Task.WhenAll(firstTask, secondTask);

                int firstIndex = start;
                int secondIndex = mid;
                T[] dest = new T[end - start];
                for (int i = 0; i < dest.Length; i++)
                {
                    if (firstIndex >= mid)
                    {
                        dest[i] = array[secondIndex++];
                    }
                    else if (secondIndex >= end)
                    {
                        dest[i] = array[firstIndex++];
                    }
                    else if (comparer.Compare(array[firstIndex], array[secondIndex]) < 0)
                    {
                        dest[i] = array[firstIndex++];
                    }
                    else
                    {
                        dest[i] = array[secondIndex++];
                    }
                }

                dest.CopyTo(array, start);
            }
        }

        static void SelectionSort<T>(T[] array, int start, int end, Comparer<T> comparer)
        {
            // note: using selection sort here to prevent time variability
            for (int i = start; i < end; i++)
            {
                int minIndex = i;
                for (int j = i + 1; j < end; j++)
                {
                    if (comparer.Compare(array[j], array[minIndex]) < 0)
                    {
                        minIndex = j;
                    }
                }
                T temp = array[i];
                array[i] = array[minIndex];
                array[minIndex] = temp;
            }
        }
    }
}

In the performance profile of this code, the two copies of System.Runtime.CompilerServices.AsyncTaskMethodBuilder::Start (one for each generic value type) take up most of the self-processor time, with the two MergeSort methods only taking up a very small fraction of the self-processor time. Similar behavior was also noted when Task.Run was not used (thus utilizing only a single processor).

EDIT 2: Thank you very much for your answers so far. I originally thought that the fact that Task<TResult> was being used was part of the problem (since it was being used in the original code), hence my structure of copying arrays instead of sorting in place. However, I recognize now that that is irrelevant, so I changed the code snippet above to instead do a merge sort in place. I have also reduced the task count by introducing a non-trivial sequential cutoff (doing selection sort for the sake of tightly bound time), as well as using Comparer objects to prevent boxing allocations of the array elements (and thus reducing the profiling interference caused by the garbage collector).

However, the same pattern, that of AsyncTaskMethodBuilder::Start taking significant self time, still exists and can still be found in the profiling results.

EDIT 3: To clarify, the answer I am/was looking for is not "why is this code slow?", but "why is the .NET profiler telling me that most of the cost is spent inside a method I have no control over?" The accepted answer helped me identify the problem, which was that most of the logic was inside a generated type that the profiler does not include.

like image 700
TheHans255 Avatar asked Jun 09 '17 17:06

TheHans255


2 Answers

The problem you got here is that you spawn that many tasks that you overload the normal task pool, causing .NET to spawn additional tasks. Since you keep creating new tasks all the way down to the length of the array being 1. AsyncTaskMethodBuilder::Start starts being a significant time consumer, once it needs to create new tasks to continue the execution and can't reuse tasks from the pool.

There are a couple of things you need to change to get some performance into your function:

First: Clean up you await's

Task<T[]> firstTask = Task.Run(() => MergeSort(firstHalf));
Task<T[]> secondTask = Task.Run(() => MergeSort(secondHalf));

await Task.WhenAll(firstTask, secondTask);

T[] firstDest = await firstTask;
T[] secondDest = await secondTask;

This is already a problem. Remember that every await counts. Event if the Task is already completed, await still splits the function at this point, releases the current Task and continues the remaining function in a new Task. This switch takes time. Not much, but this happens so often in your function and it will be measurable.

Task.WhenAll already returns the result value you need.

Task<T[]> firstTask = Task.Run(() => MergeSort(firstHalf));
Task<T[]> secondTask = Task.Run(() => MergeSort(secondHalf));

T[][] dests = await Task.WhenAll(firstTask, secondTask);

T[] firstDest = dests[0];
T[] secondDest = dests[1];

This way you can reduce the amount of task switches in your function.

Second: Reduce the amount of Task instances created.

Tasks are a great tool to distribute work on your different CPU cores, but you have to make sure that they are busy. There is a in creating a new Task and you have to be sure that it is worth it.

I suggest adding a threshold to the point where new Tasks are created. If the portions you are working on get too small, you shouldn't create new Task instances, but instead call the functions directly.

For example:

T[] firstDest;
T[] secondDest;
if (mid > 100) 
{
  Task<T[]> firstTask = Task.Run(() => MergeSort(firstHalf));
  Task<T[]> secondTask = Task.Run(() => MergeSort(secondHalf));

  T[][] dests = await Task.WhenAll(firstTask, secondTask);

  firstDest = dests[0];
  secondDest = dests[1];
} 
else 
{
  firstDest = MergeSort(firstHalf);
  secondDest = MergeSort(secondHalf);
}

You should try different values and see how this changes thing. 100 is just a value I'd start with, but you can choose any other. This will reduce the amount of tasks that don't have much to do by a significant number. Basically the value determines how much remaining work is acceptable for one task to be handled.

Lastly you should think about handling your array instances differently. If you tell your function the start position and the length of the portion of the array they are expected to work in, you should be able to improve the performance even more, because you don't have to copy the arrays thousands of times.

like image 171
Nitram Avatar answered Oct 17 '22 04:10

Nitram


Interesting. I took your sample and altered the MergeSort async method to be non-async. Now a profiling session takes ~33 sec to complete (vs ~36 sec for the async version, both using Release config). The non-async version looks like this:

    static Task<T[]> MergeSort<T>(T[] src) where T : IComparable<T>
    {
        if (src.Length <= 1)
        {
            return Task.FromResult(src);
        }
        else
        {
            int mid = src.Length / 2;
            T[] firstHalf = new T[mid];
            T[] secondHalf = new T[src.Length - mid];
            Array.Copy(src, 0, firstHalf, 0, mid);
            Array.Copy(src, mid, secondHalf, 0, src.Length - mid);

            Task<T[]> firstTask = Task.Run(() => MergeSort(firstHalf));
            Task<T[]> secondTask = Task.Run(() => MergeSort(secondHalf));

            return Task.WhenAll(firstTask, secondTask).ContinueWith(
                continuationFunction: _ =>
                {
                    T[] firstDest = firstTask.Result;
                    T[] secondDest = secondTask.Result;
                    int firstIndex = 0;
                    int secondIndex = 0;

                    T[] dest = new T[src.Length];
                    for (int i = 0; i < dest.Length; i++)
                    {
                        if (firstIndex >= firstDest.Length)
                        {
                            dest[i] = secondDest[secondIndex++];
                        }
                        else if (secondIndex >= secondDest.Length)
                        {
                            dest[i] = firstDest[firstIndex++];
                        }
                        else if (firstDest[firstIndex].CompareTo(secondDest[secondIndex]) < 0)
                        {
                            dest[i] = firstDest[firstIndex++];
                        }
                        else
                        {
                            dest[i] = secondDest[secondIndex++];
                        }
                    }

                    return dest;
                },
                cancellationToken: System.Threading.CancellationToken.None,
                continuationOptions: TaskContinuationOptions.ExecuteSynchronously,
                scheduler: TaskScheduler.Default);
        }
    }

So, for this particular example, the async/await overhead seems to be ~3 seconds. This is more than I expected, but it's certainly not a bottleneck here.

Regarding this observation:

In the performance profile of this code, the two copies of System.Runtime.CompilerServices.AsyncTaskMethodBuilder::Start (one for each generic value type) take up most of the self-processor time, with the two MergeSort methods only taking up a very small fraction of the self-processor time.

I haven't analyzed the compiler-generated code for this particular async method, but I suspect MergeSort just contains a short prolog/epilog code, while the actual CPU-intensive code is indirectly invoked by AsyncTaskMethodBuilder::Start.

like image 24
noseratio Avatar answered Oct 17 '22 04:10

noseratio