Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How much garbage collection is too much

Our web application has several hundreds of users and is (as always) mainly consisting of legacy code.

After our move to Azure we were able to see and measure more metrics than before. We are experiencing performance issues and I see that our garbage collection is going through the roof (As measured in the diagnostics tab of the web app, under the performance counters section). In one minute we were able to have these numbers:

  • 90160 GEN 0 garbage collections
  • 76910 GEN 1 garbage collections
  • 75110 GEN 2 garbage collections

And this for a mere 18580 HTTP requests, so on average we have:

  • 4,85 GEN 0 garbage collections per request
  • 4,13 GEN 1 garbage collections per request
  • 4,04 GEN 2 garbage collections per request

These numbers are still rising, even though the amount of requests stays about the same (see graph)

GC going bonkers

My question / remarks are:

  • Are these numbers the amount of objects that the GC cleans, or the amount of times GC had to be active?
  • Under a load like this, how much GC's would be concidered as "normal", knowing very well that none is the perfect answer, but practically...
  • How is it possible that even if the amounts of request stays the same, the number of GC collections are rising like this?

Thank you very much in advance, John


Update 1: 30/06/2018 @ 8:16 UTC+2

After updating application insights to monitor the garbage collection more closely I found a big eye-opener in performance hit. First of all, this is the average percent of time spent in GC:

Average time in GC around 4 and a half percent

It averages around 4,5 percent of the time (but there has been an inactive period during the night in this period) and the average peeks around 10% of the time. Then I thought to visualize the maximum amount of time the application was in GC mode and I almost fell of my chair:

Maximum amount of time in GC around 99 percent

This might be a wrong image. But it illustrates that our code has to wait for GC a lot! We really are going to have to fix this.

like image 586
John Verbiest Avatar asked Jun 29 '18 08:06

John Verbiest


1 Answers

These numbers are suspiciously high for a typical web app. I'd say they are 10-1000x what they normally are (75110 GEN 2 in one minute. Sounds more like a micro benchmark for the GC :) ).

Is someone calling GC.Collect()? Grep the source code for that.

Still, you need to find out if they are causing your perf problem. If they are not causing it then there is no need to fix this. Look at the time spent in GC counter. You can use PerfView to easily measure the GC pauses that are taken. That get's you an idea for the pause latency that customers are facing.

Are these numbers the amount of objects that the GC cleans, or the amount of times GC had to be active?

These are GCs, not objects.

Under a load like this, how much GC's would be concidered as "normal", knowing very well that none is the perfect answer, but practically...

"None" is certainly not the right answer. There is no point in saving GCs if it does not improve performance meaningfully. You spend dev time for nothing if you go after that goal. You can certainly take a "normal" number of GCs.

There is no way to give a normal number. It's more about time spend in GC (which is the overhead you have to pay for) and the G2 pause times that customers will take waiting for the page to load.

How is it possible that even if the amounts of request stays the same, the number of GC collections are rising like this?

Something horrible sits in your code I would say :) Maybe threads continuously spinning up calling GC.Collect()?! A nightmare come true. Grep your code and report back. I will extend this answer to help your investigation.

Using PerfView or some profiler (I use JetBrains) you should be able to see where GCs are triggered in your code.

like image 130
usr Avatar answered Sep 29 '22 04:09

usr