Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reduce time spent in GC

I'm creating a desktop application that has a compute-heavy operation that potentially runs for several seconds. Obviously there is a need to minimize the time of this operation. The operation is fairly easy to parallellize (individual subtasks), and each subtask takes around 50ms on a single thread. On multiple threads, each subtask takes 4-5 times as long because 40-50% time is spent in GC, effectively cancelling the speedup completely.

So I need to give the GC less work. My first thought was to try to find which type of object was being garbage collected the most, but I realized that although I often do memory profiling, I had never searched for a pattern like this. Usually a look at heap snapshots, or differences between heap snapshots, but these show objects that are alive, not the objects that were created and disposed between those snapshots. So that is my first question: what is the easiest way to find which types are created and garbage collected the most? I tried looking for method call counts to see if some constructor was called suspiciously often, but all objects created in millions were only small struct types. These should have no effect on GC even if boxed if I understand things correctly?

The algorithm creates hundreds of thousands of individual result point objects. These of course aren't supposed to be gc'd because they represent the output of the operation. But it leads me to my second question: is the time spent in GC mostly dependent on the total number of objects or mostly depending on the number of objects actually collected? Should I try to limit the number of result objects and instead use fewer but larger result objects?

Edit: I found the time spent in GC by using the VS 2010 concurrency visualizer. Also, in the parallel piece of code most sections of blocked threads were waiting for gc

Edit: I should clarify that the performance problem is because the execution is effectively serialized on the workstation GC. See for example the performance problem described in this post.

http://blogs.msdn.com/b/hshafi/archive/2010/06/17/case-study-parallelism-and-memory-usage-vs2010-tools-to-the-rescue.aspx

I can't do anything about the garbage collector blocking my threads (and I don't think I want the server GC for a desktop app, correct?). So in order to get a linear speedup for this operation, I need to reduce the number of times the GC is invoked. Most of the time wasted is actually wasted by other threads blocked waiting for one thread to do GC.

like image 727
Anders Forsgren Avatar asked Nov 13 '11 21:11

Anders Forsgren


People also ask

Why does GC take so long?

CPU Usage – And it all comes down to CPU usage. A major symptom of continuous GC / Stop the World events is a spike in CPU usage. GC is a computationally heavy operation, and so can take more than its fair share of CPU power. For GCs that run concurrent threads, CPU usage can be even higher.

What is a good GC time?

Spend less than 0.5 seconds in each GC cycle. The percentage of time in garbage collection should be less than 3% - This percentage can be calculated by dividing the sum of the garbage collection times over an interval by the interval.

What is GC pause duration?

Metronome garbage collector (GC) pause time can be fine-tuned for each Java™ process. By default, the Metronome GC pauses for 3 milliseconds in each individual pause, which is known as a quantum.

Is GC collect slow?

Impacts On Performance If it was, every language would use it. GC is slow, mostly because it needs to pause program execution to collect garbage. Think of it like this — your CPU can only work on one thing at a time.


3 Answers

Personally, if your tasks as taking only 50ms to execute, the overhead of thread creation etc, is going to take more more time than your actual jobs, which is what it appears that you are seeing. So you might not be able to get too far into it.

As for seeing what is out there, the best tools that I've used are ANTS Profiler (Memory and Performance). From there you can see objects in memory, and differences between points in time as well as "number of executions" which should get you what you want.

like image 90
Mitchel Sellers Avatar answered Nov 03 '22 21:11

Mitchel Sellers


Perhaps you should look at increasing the cache hits between your objects.

So rather than creating new struct points and then performing calculations in lists/enumerables, Have you tried allocating a fixed array of points and then continuously reusing the points. That way you allocate the objects only once, perform your calculations and then return. You will benefit from hot cache and you will not suffer any GC if you are able to completely reuse the array.

like image 28
Spence Avatar answered Nov 03 '22 21:11

Spence


Old question, but for those that stumble on it...

I had exactly the same problem and fixed it permanently by setting server-mode garbage collection http://msdn.microsoft.com/en-us/library/ms229357(v=vs.110).aspx.

In app.config add:

  <runtime>
     <gcServer enabled="true" />
  </runtime>

That already speeded my code up by an order of magnitude, with no side-effects that I could find.

If you know exactly where you're generating a lot of GCs, I also found that LowLatency http://msdn.microsoft.com/en-us/library/system.runtime.gclatencymode(v=vs.110).aspx brought my GCs down to a single generation-1 GC:

GC.Collect ' pre-emptively collect before time-critical region
Dim oldmode As GCLatencyMode = GCSettings.LatencyMode
RuntimeHelpers.PrepareConstrainedRegions()

Try
    GCSettings.LatencyMode = GCLatencyMode.LowLatency

    ' Work that allocates tons of memory here

Finally
    GCSettings.LatencyMode = oldmode

End Try

(The PrepareConstrainedRegions hopefully ensures that the Finally block is always executed, but I'm not entirely sure this is correct).

like image 33
smirkingman Avatar answered Nov 03 '22 23:11

smirkingman