Performance Metrics/Diagnostics of .NET Tasks

Tags:

Is there a way to get data out from .NET about the number of (C# 5 or newer, so post-async/await) Tasks pending execution, and similar metrics, for use in diagnosing issues happening to production servers?

The case I'm discussing is an async-all-the-way-down system (for example a massive parallel socket server where each request is run async from the start) where the initial Task either spawns multiple Tasks that each take time to process (or each start more Tasks), or spawns Tasks some of which block (like third party code) and some of which work async correctly. There are two situations I've seen which are difficult to diagnose usefully:

Under normal load, everything works fine, but if there are enough requests then the CPU very quickly jumps up to 100% and all requests finish slower and slower. When load lessens, CPU will stay at 100% until the majority of the pending tasks are finished gradually, and then the CPU drops down to normal levels.
Under normal load, everything works fine, but if there are enough requests then certain requests (all of which are properly async) no longer finish at all or very slowly. When the load lessens, CPU will stay at 100% while they're all processed, but there will be speed-bumps in the task completion rate where it slows down considerably for a short time.

I've tried writing up a simple test for this, but there's no apparent way to limit the number of executors and the number of tasks I need to create to test it makes it very hard to parse the information. It's also quite hard to not interfere with the test itself by attempting to log out debug information. I will continue attempting to create a better test case and amend my question if needed.

From my understanding of the problem and the async task system, both of these are effectively contention on the executors that actually run the tasks.

The first case happens because more Tasks are being created than are actually finishing, in which case a counter of pending tasks would be useful to diagnose this even before the load is high enough to lock up the service.

The second case happens because a certain set of Tasks goes long enough without yielding that over time (with enough load) all the executors end up running these tasks at the same time. And as soon as one finishes, it'll process some Tasks, only to be replaced by another long-running task soon enough. In this case, a pending tasks counter would be useful, as well as some other metrics.

Is anything of the sort available, or is there some undocumented/hacky way to graft a bit of code to the start/end of every task started in an application, to make it log out/measure these things and throw warnings when the task number is exploding?

996

asked May 17 '18 18:05

Mike Roibu

Video Answer

1 Answers

You can inherit a class from EventListener to handle events that Task Parallel Library produces. Probably, you can count queued and running tasks and store profiling information associated with a task in a ConcurrentDictionary this way. However, there are complications such as non-uniqueness of task ids or performance implications of this profiling.

Example implementation:

public class TplEventListener : EventListener
{
    static readonly Guid _tplSourceGuid = new Guid("2e5dba47-a3d2-4d16-8ee0-6671ffdcd7b5");
    readonly EventLevel _handledEventsLevel;

    public TplEventListener(EventLevel handledEventsLevel)
    {
        _handledEventsLevel = handledEventsLevel;
    }

    protected override void OnEventSourceCreated(EventSource eventSource)
    {
        if (eventSource.Guid == _tplSourceGuid)
            EnableEvents(eventSource, _handledEventsLevel);
    }

    protected override void OnEventWritten(EventWrittenEventArgs eventData)
    {
        if (eventData.EventSource.Guid != _tplSourceGuid)
            return;

        switch (eventData.EventId)
        {
            // TODO: Add case for each relevant EventId (such as TASKSCHEDULED_ID and TASKWAITBEGIN_ID)
            // and explore relevant data (such as task Id) in eventData.Payload. Payload is described by 
            // eventData.PayloadNames.
            // For event ids and payload meaning explore TplEtwProvider source code 
            // (https://referencesource.microsoft.com/#mscorlib/system/threading/Tasks/TPLETWProvider.cs,183).
            default:
                var message = new StringBuilder();
                message.Append(eventData.EventName);
                message.Append("(");
                message.Append(eventData.EventId);
                message.Append(") { ");
                if (!string.IsNullOrEmpty(eventData.Message))
                {
                    message.Append("Message = \"");
                    message.AppendFormat(eventData.Message, eventData.Payload.ToArray());
                    message.Append("\", ");
                }
                for (var i = 0; i < eventData.Payload.Count; ++i)
                {
                    message.Append(eventData.PayloadNames[i]);
                    message.Append(" = ");
                    message.Append(eventData.Payload[i]);
                    message.Append(", ");
                }
                message[message.Length - 2] = ' ';
                message[message.Length - 1] = '}';
                Console.WriteLine(message);
                break;
        }
    }
}

Initialize and store new TplEventListener(EventLevel.LogAlways) in each AppDomain and you will get log similar to:

NewID(26) { TaskID = 1 }
TaskScheduled(7) { Message = "Task 1 scheduled to TaskScheduler 1.", OriginatingTaskSchedulerID = 1, OriginatingTaskID = 0, TaskID = 1, CreatingTaskID = 0, TaskCreationOptions = 8192 }
NewID(26) { TaskID = 2 }
TraceOperationBegin(14) { TaskID = 2, OperationName = Task.ContinueWith: < SendAsync > b__0, RelatedContext = 0 }
TaskStarted(8) { Message = "Task 1 executing.", OriginatingTaskSchedulerID = 1, OriginatingTaskID = 0, TaskID = 1 }
AwaitTaskContinuationScheduled(12) { OriginatingTaskSchedulerID = 1, OriginatingTaskID = 0, ContinuwWithTaskId = 2 }
NewID(26) { TaskID = 3 }
TraceOperationBegin(14) { TaskID = 3, OperationName = Async: < Main > d__3, RelatedContext = 0 }
NewID(26) { TaskID = 4 }
TaskWaitBegin(10) { Message = "Beginning wait (2) on Task 4.", OriginatingTaskSchedulerID = 1, OriginatingTaskID = 0, TaskID = 4, Behavior = 2, ContinueWithTaskID = 3 }
TaskWaitBegin(10) { Message = "Beginning wait (1) on Task 3.", OriginatingTaskSchedulerID = 1, OriginatingTaskID = 0, TaskID = 3, Behavior = 1, ContinueWithTaskID = 0 }
TraceSynchronousWorkBegin(17) { TaskID = 1, Work = 2 }
TraceSynchronousWorkEnd(18) { Work = 2 }
TraceOperationEnd(15) { TaskID = 1, Status = 1 }
RunningContinuation(20) { TaskID = 1, Object = 0 }
TaskCompleted(9) { Message = "Task 1 completed.", OriginatingTaskSchedulerID = 1, OriginatingTaskID = 0, TaskID = 1, IsExceptional = False }

For more information check:

Async Causality Chain Tracking article by Andrew Stasyuk
A Few Words on Task.Id (and TaskScheduler.Id) article by Stephen Cleary.
How do I listen to TPL TaskStarted/TaskCompleted ETW events discussion on StackOverflow
System.Threading.Tasks.TplEtwProvider source code

answered Nov 04 '22 17:11

Leonid Vasilev

Related questions
                            
                                How to correctly use Partial views with Ajax Begin form
                            
                                dnx451 RC1 What happened to InMemorySymmetricSecurityKey?
                            
                                Service Fabric include additional files
                            
                                Throw "IDX10223: Lifetime validation failed. The token is expired." when working Azure AD with Microsoft.Owin.Security.OpenIdConnect
                            
                                Entity Framework Core - Customise Scaffolding
                            
                                Expression-bodied method: Return nothing
                            
                                How to remove (Android) app title bar in Xamarin.Forms?
                            
                                What is a proper way of writing entity POCO classes in Entity Framework Core?
                            
                                Which is better to catch all exceptions except given types: catch and rethrow or catch when?
                            
                                How can I manually check the url authorization in MVC5?
                            
                                Reflection - Call constructor with parameters
                            
                                Selenium RemoteWebDriver c# - System.InvalidOperationException
                            
                                Entity Framework Core: private or protected navigation properties
                            
                                Execute SSRS Report from C# save as PDF
                            
                                When I upgrade from ASP.NET Core 1.x to 2.0, what is the replacement for IdentityCookieOptions?
                            
                                After updating to vs2017.3, the breakpoints will not be hit
                            
                                Xamarin build ERROR : error APT0000: In <declare-styleable> ..., unable to find attribute
                            
                                RichTextBox cannot display Unicode Mathematical alphanumeric symbols
                            
                                .Net Core 2.0 Authorization always returning 401
                            
                                How to pass a POCO class to .NET Core configuration

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Performance Metrics/Diagnostics of .NET Tasks

Tags:

c#

.net

async-await

task

Mike Roibu

People also ask

Video Answer

1 Answers

Leonid Vasilev

Recent Activity

Donate For Us