Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Debugging high CPU ussage on IIS process

High CPU usage on IIS process

I'm currently investigating high CPU usage on one of our production servers, I'm stuck and hope that someone here is able to help. Comparing CPU usage with number of inbound web-service calls (WCF and REST) tells me that they are unrelated, CPU usage seems to climb whenever calls are increasing or dropping.

The setup

  • Windows 2012 R2 server x64
  • IIS 8.5
  • .NET 4.5
  • Single app pool running our app
  • Auto recycle app pool at 00:00

The Problem

Recently our CPU usage has increased drastically, the pattern seems to be that CPU usage climbs from midnight (where pool i recycled) and keeps climbing until pool is recycled again.

CPU trend the last days

The Data

By analyzing taskmanager and using perfmon counters I have been able to confirm that it indeed is our W3WP process that consumes the CPU.

I configured DebugDiag 2.1 to take 3 memory dumps 10 sec apart when CPU consumption raised over a threshold of 50% for more than 10 sec. (normal CPU usage before this problem was 5-10%)

Looking into the dump files using ntsd, I see that several threads consume a lot of CPU:

0:047> !runaway
 User Mode Time
  Thread       Time
  47:2920      0 days 0:24:42.921
  49:1f1c      0 days 0:23:07.796
  52:2ed8      0 days 0:21:38.218
  54:1560      0 days 0:21:37.937
  48:273c      0 days 0:21:19.140
  59:2110      0 days 0:20:56.078
  45:2d90      0 days 0:20:35.906
...
  19:1c88      0 days 0:00:00.000

(Just showing some of the threads here) So I try to see what is going on in these threads as this is not expected behavior. All the threads with long running tasks seems to be managed, BUT when ever i try !clrstack on any of them i get:

0:047> !clrstack
OS Thread Id: 0x2920 (47)
        Child SP               IP Call Site
GetFrameContext failed: 1
0000000000000000 0000000000000000 <unknown>

Which confuses me as I was expecting a managed stack. When I look at the native stack I get something different:

0:047> !dumpstack
OS Thread Id: 0x2920 (47)
Current frame: ntdll!NtWaitForSingleObject+0xa
Child-SP         RetAddr          Caller, Callee
0000009c46eae730 00007fff0c131118 KERNELBASE!WaitForSingleObjectEx+0x94, calling ntdll!NtWaitForSingleObject
0000009c46eae7d0 00007fff051f91eb clr!CLREventWaitHelper2+0x38, calling kernel32!WaitForSingleObjectEx
0000009c46eae7e0 00007fff0c13155c KERNELBASE!SetEvent+0xc, calling ntdll!NtSetEvent
0000009c46eae810 00007fff051f9197 clr!CLREventWaitHelper+0x1f, calling clr!CLREventWaitHelper2
0000009c46eae870 00007fff051f9120 clr!CLREventBase::WaitEx+0x70, calling clr!CLREventWaitHelper
0000009c46eae8b0 00007fff052890e6 clr!SVR::t_join::join+0x106, calling clr!CLREventBase::WaitEx
0000009c46eae900 00007fff053d5913 clr!SVR::gc_heap::bgc_thread_function+0x97, calling clr!CLREventBase::WaitEx
0000009c46eae940 00007fff0533fcb6 clr!Thread::intermediateThreadProc+0x7d
0000009c46eaeb10 00007fff0ef4086d ntdll!RtlAllocateHeap+0x17d, calling ntdll!RtlpAllocateHeap
0000009c46eaec20 00007fff0ef40073 ntdll!RtlpSubSegmentInitialize+0x2f3, calling ntdll!RtlpHeapGenerateRandomValue32
0000009c46eaeca0 00007fff0ef40c65 ntdll!RtlpLowFragHeapAllocFromContext+0x355, calling ntdll!memset
0000009c46eaed10 00007fff0ef40c65 ntdll!RtlpLowFragHeapAllocFromContext+0x355, calling ntdll!memset
0000009c46eaed70 00007fff0ef407c7 ntdll!RtlAllocateHeap+0xd7, calling ntdll!RtlpLowFragHeapAllocFromContext
0000009c46eaede0 00007fff0ef407c7 ntdll!RtlAllocateHeap+0xd7, calling ntdll!RtlpLowFragHeapAllocFromContext
0000009c46eaee80 00007fff05f5a89a mscoree!calloc_impl+0x72, calling ntdll!RtlAllocateHeap
0000009c46eaeeb0 00007fff0c135ac4 KERNELBASE!SetTEBLangID+0x2c, calling ntdll!RtlSetLastWin32ErrorAndNtStatusFromNtStatus
0000009c46eaeef0 00007fff05cf15e6 mscoreei!calloc_impl+0x5d, calling ntdll!RtlAllocateHeap
0000009c46eaef20 00007fff05cf145b mscoreei!initptd+0xb7, calling mscoreei!unlock
0000009c46eaef30 00007fff0ef40c65 ntdll!RtlpLowFragHeapAllocFromContext+0x355, calling ntdll!memset
0000009c46eaef50 00007fff05cf138e mscoreei!CRT_INIT+0x135, calling kernel32!GetCurrentThreadId
0000009c46eaef80 00007fff05cf11ee mscoreei!__DllMainCRTStartup+0x8a, calling mscoreei!DllMain
0000009c46eaefe0 00007fff0c3b1387 00007fff0c3b1387
0000009c46eaf030 00007fff04c111d2 00007fff04c111d2, calling 00007fff04c11070
0000009c46eaf0c0 00007fff0ef407c7 ntdll!RtlAllocateHeap+0xd7, calling ntdll!RtlpLowFragHeapAllocFromContext
0000009c46eaf0f0 00007fff0ef2c187 ntdll!RtlDeactivateActivationContextUnsafeFast+0xc7, calling ntdll!_security_check_cookie
0000009c46eaf110 00007fff05085c6e MSVCR120_CLR0400!calloc_impl+0x5d, calling ntdll!RtlAllocateHeap
0000009c46eaf120 00007fff0ef2c2a3 ntdll!RtlActivateActivationContextUnsafeFast+0x93, calling ntdll!_security_check_cookie
0000009c46eaf140 00007fff05085d9b MSVCR120_CLR0400!initptd+0xb7, calling MSVCR120_CLR0400!unlock
0000009c46eaf160 00007fff03064d9c clrjit!__DllMainCRTStartup+0x8d, calling clrjit!DllMain
0000009c46eaf190 00007fff0ef4b9b8 ntdll!LdrpReleaseModuleEnumLock+0x1c, calling ntdll!RtlReleaseSRWLockShared
0000009c46eaf1a0 00007fff0ef2c324 ntdll!LdrpCallInitRoutine+0x4c
0000009c46eaf1c0 00007fff0ef4b96b ntdll!LdrpReleaseLoaderLock+0x27, calling ntdll!LdrpReleaseModuleEnumLock
0000009c46eaf200 00007fff0ef2c083 ntdll!LdrpInitializeThread+0x1f3, calling ntdll!LdrpReleaseLoaderLock
0000009c46eaf270 00007fff0ef2bfc3 ntdll!LdrpInitializeThread+0x133, calling ntdll!RtlActivateActivationContextUnsafeFast
0000009c46eaf278 00007fff0ef2bff6 ntdll!LdrpInitializeThread+0x166, calling ntdll!RtlDeactivateActivationContextUnsafeFast
0000009c46eaf2e0 00007fff0ef28fa3 ntdll!_LdrpInitialize+0x93, calling ntdll!NtTestAlert
0000009c46eaf350 00007fff0ef28ec8 ntdll!LdrInitializeThunk+0x18, calling ntdll!NtContinue
0000009c46eaf7c0 00007fff0533fc9f clr!Thread::intermediateThreadProc+0x66, calling clr!_chkstk
0000009c46eaf800 00007fff0e7713d2 kernel32!BaseThreadInitThunk+0x22
0000009c46eaf830 00007fff0ef25444 ntdll!RtlUserThreadStart+0x34

This makes me believe that the thread is waiting for some resource (is this correct?) and this is where I get really lost! What is going on in those lines in the middle of the trace:

0000009c46eaefe0 00007fff0c3b1387 00007fff0c3b1387
0000009c46eaf030 00007fff04c111d2 00007fff04c111d2, calling 00007fff04c11070

My guess would be that some managed stuff is going on here, but why is !clrstack not working then? By looking at the first frame it looks to me as it is waiting for some resource handle. I looks like the handle is 0xa but I'm far from sure of this. Looking in to the handle with !handle 0xa ff i get this:

0:047> !handle 0xa ff
Handle 000000000000000a
  Type          File
  Attributes    0
  GrantedAccess 0x100020:
         Synch
         Execute/Traverse
  HandleCount   2
  PointerCount  65535
  No object specific information available

Telling me that this points to a file, but WHAT file, how do I move on from here? Looking at other of the top runway threads gives me the same picture.

The cry for help

I know that this is a huge questing, but I really need help moving on from here. Am I on the right track or am I just poking around in the dark? Any help would be greatly appreciated!

The news

After making our IT dep. record a perfMon dataset to me with some counters that I found interesting I have come to a conclusion: 1) We are leaking threads 2) The GC is going crazy (probably because of the leak). Any ideas on how to detect what is causing the thread leak? See the counters here: 2

After discovering that we are leaking threads, I have been looking into our code and found some code of interest:

// Initialize TimerExecutionEveryMinute timer
const double timeConversion = 60 * 1000; //convert one minute to milliseconds

var timer1 = new System.Timers.Timer { Enabled = true, Interval = timeConversion };
timer1.Elapsed += TimerExecutionEveryMinute;

and then:

// Execution every minute
public static void TimerExecutionEveryMinute(object sender, EventArgs e)
{
    var jpsLogger = KernelContainer.Kernel.Get<IJpsLogger>();

    // Initialize MemBag
    MemBag.Log.ActivityIdReset(Guid.NewGuid());
    MemBag.Log.BaseType = "TimerExecution";
    MemBag.Log.BaseClass = "Timer";
    MemBag.Log.BaseMethod = "TimerExecutionEveryMinute";

    // Statistic timer job
    var t1 = jpsLogger.Trace.SpecializedDebug("Analyses.OneMinuteTimer", "One minute timer begin");

        var t2 = jpsLogger.Trace.SpecializedDebug("Analyses.OneMinuteTimer", "    Method.WriteDB begin");
            Method.WriteDB();
        jpsLogger.Trace.SpecializedDebug("Analyses.OneMinuteTimer", "    Method.WriteDB end", t2);

        t2 = jpsLogger.Trace.SpecializedDebug("Analyses.OneMinuteTimer", "    Memory.LogCurrentState begin");
            Memory.LogCurrentState();
        jpsLogger.Trace.SpecializedDebug("Analyses.OneMinuteTimer", "    Memory.LogCurrentState end", t2);


        //Calculates the CPU load based on samples taken at every timer step
        t2 = jpsLogger.Trace.SpecializedDebug("Analyses.OneMinuteTimer", "    CPU load begin");
            CPULogger.LogCpu();
        jpsLogger.Trace.SpecializedDebug("Analyses.OneMinuteTimer", "    CPU load end", t2);

        // Dump log information to file
        t2 = jpsLogger.Trace.SpecializedDebug("Analyses.OneMinuteTimer", "    FileLogger.WriteAsync begin");
            FileLogger.WriteAsync();
        jpsLogger.Trace.SpecializedDebug("Analyses.OneMinuteTimer", "    FileLogger.WriteAsync end", t2);

        jpsLogger.Trace.SpecializedDebug("Analyses.OneMinuteTimer", "One minute timer end", t1);
}

Could this be the place that is leaking the threads? I believe that System.Timers.Timer spins up new thread every time i creates event and that it is threadsafe so i creates locks around the executing code, the code getting executed every minute among other things write to log files, my thesis if that access to the file gets blocked and threads keeps piling up which would explain the linear raising number of logical threads as seen in 2

like image 932
iCediCe Avatar asked Jun 23 '15 11:06

iCediCe


People also ask

How do I debug high CPU usage in Windows?

Use Task Manager to view CPU consumption To access Task Manager, right-click the taskbar and click Task Manager. If you see a single list of tasks in the Task Manager window, click the More Details arrow to show the expanded view. Click the Processes tab if it is not already selected by default.


1 Answers

I don't have an answer, but let me try to give some tips.

In your question you mention

Recently our CPU usage has increased drastically...

Does this imply that the app was working fine before? As in there were no unusual CPU spikes that you were aware of?

If that's the case then you need to look at what's changed recently:

  • Has any new code for your app been deployed, specifically anything that allocates a lot of data?

  • Have any updates been installed on the server, if so, can you audit them and check any Microsoft Knowledge Base articles (or simply Google the update name, see if any blogs mention them) that might mentioned the symptoms you're encountering.

In your situation it looks like the Garbage Collector is going haywire. Fist thing I would do is look at any new code that's been deployed - it could be that the code (or a side-effect of that code) is allocating objects with a side-effect of the GC going into overdrive to deal with the memory pressure.

With that in mind, why not download a trisl version of a .NET memory management tool such as RedGate's ANTS Memory Profiler and take some snapshots of memory dumps during the spikes. Using tools like this is easier to make comparisons between dumps, such as telling you the number of objects allocated/deallocated, amount of memory used, etc. It might give you a clue.

like image 151
Jason Evans Avatar answered Sep 22 '22 10:09

Jason Evans