Java garbage collector G1GC taking long for 'Object Copy' (Evacuation Pause)

Tags:

I'm not a Java newby but I know only a tiny bit about garbage collection. Now I would like to change that with some practical experiences. My goal is a latency of under 0.3 seconds, or in extreme cases 0.5 is okay too.

I have an application with -Xmx50gb (-Xms50gb) and set the other following GC options:

-XX:+UseG1GC -Xloggc:somewhere.gc.log -XX:+PrintGCDateStamps

But now I'm occasionally having long pauses over 5sec due to garbage collections and that although there seems to be enough memory available. One reason I found:

[GC pause (G1 Evacuation Pause) (young) 42G->40G(48G), 5.9409662 secs]

Why is GCG1 still doing a "stop the world" for this? (Or at least I see that it stops my application at exactly this time) And why does it do such a negative cleanup if it is not really necessary as there is over 12% of the available RAM free. Also I thought that the default value for -XX:MaxGCPauseMillis is 200 milliseconds, why is this value violated by a factor of 29 or even 50 (see below)?

Another reason for a delay was:

[GC pause (Metadata GC Threshold) (young) (initial-mark) 40G->39G(48G), 10.4667233 secs]

This will be probably solved via this answer e.g. just increasing the metadata space -XX:MetaspaceSize=100M

BTW: Using JSE 1.8.0_91-b14

Update: detailed GC log of such an event

2016-08-12T09:20:31.589+0200: 1178.312: [GC pause (G1 Evacuation Pause) (young) 1178.312: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 3159, predicted base time: 1.52 ms, remaining time: 198.48 ms, target pause time: 200.00 ms]
 1178.312: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 136 regions, survivors: 20 regions, predicted young region time: 1924.75 ms]
 1178.312: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 136 regions, survivors: 20 regions, old: 0 regions, predicted pause time: 1926.27 ms, target pause time: 200.00 ms]
 1185.330: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: recent GC overhead higher than threshold after GC, recent GC overhead: 21.83 %, threshold: 10.00 %, uncommitted: 0 bytes, calculated expansion amount: 0 bytes (20.00 %)]
 1185.330: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycle initiation, reason: still doing mixed collections, occupancy: 42580574208 bytes, allocation request: 0 bytes, threshold: 23592960000 bytes (45.00 %), source: end of GC]
 1185.330: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 1 regions, reclaimable: 3381416 bytes (0.01 %), threshold: 5.00 %]
, 7.0181903 secs]
   [Parallel Time: 6991.8 ms, GC Workers: 10]
      [GC Worker Start (ms): Min: 1178312.6, Avg: 1178312.8, Max: 1178312.9, Diff: 0.2]
      [Ext Root Scanning (ms): Min: 1.1, Avg: 1.5, Max: 2.3, Diff: 1.2, Sum: 15.0]
      [Update RS (ms): Min: 0.0, Avg: 0.3, Max: 1.3, Diff: 1.3, Sum: 3.4]
         [Processed Buffers: Min: 0, Avg: 2.1, Max: 5, Diff: 5, Sum: 21]
      [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.4]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, Sum: 1.7]
      [Object Copy (ms): Min: 6964.1, Avg: 6973.0, Max: 6989.5, Diff: 25.3, Sum: 69730.4]
      [Termination (ms): Min: 0.0, Avg: 16.4, Max: 25.3, Diff: 25.3, Sum: 164.4]
         [Termination Attempts: Min: 1, Avg: 3.2, Max: 13, Diff: 12, Sum: 32]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]
      [GC Worker Total (ms): Min: 6991.5, Avg: 6991.6, Max: 6991.7, Diff: 0.2, Sum: 69915.5]
      [GC Worker End (ms): Min: 1185304.3, Avg: 1185304.3, Max: 1185304.3, Diff: 0.0]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.3 ms]
   [Other: 26.0 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 25.3 ms]
      [Ref Enq: 0.1 ms]
 [Redirty Cards: 0.1 ms]
      [Humongous Register: 0.2 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.2 ms]
   [Eden: 2176.0M(2176.0M)->0.0B(2176.0M) Survivors: 320.0M->320.0M Heap: 40.6G(48.8G)->40.0G(48.8G)]
 [Times: user=0.55 sys=46.58, real=7.02 secs]

Read here about it: Copying (Stop the World Event) - These are the stop the world pauses to evacuate or copy live objects to new unused regions. This can be done with young generation regions which are logged as [GC pause (young)]. Or both young and old generation regions which are logged as [GC Pause (mixed)].

839

asked Aug 11 '16 20:08

Karussell

1 Answers

Why is GCG1 still doing a "stop the world" for this?

Because G1 is not a pauseless collector, it is just a low-pause collector.

Also I thought that the default value for -XX:MaxGCPauseMillis is 200 milliseconds, why is this value violated by a factor of 29 or even 50 (see below)?

It is, but it's just a goal, not a guarantee. Many things can cause it to fail to meet that goal. You got a fairly large heap, this makes things more difficult, i.e. failures are easier to provoke.

Anyway, the GC tuning journey starts with enabling verbose GC logging via

-Xloggc:<path to gc log file>
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails

Update: These options are for hotspot 8. 9 and later use unified logging which has a different parameter format.

and then running the resulting log through GCViewer to get a general overview and then going back to reading individual log entries (there are many answers/blog posts on this topic) to figure out what might be causing the worst behavior. Depending on the cause various remedies can be tried.

Some general understanding of how tracing garbage collectors work in general and G1 will be necessary to avoid cargo-culting.

My application has many allocations which could be easily called "humongous allocations".

If that actually is the cause then current VMs have some experimental options to reclaim them sooner.

 [Object Copy (ms): Min: 6964.1, Avg: 6973.0, Max: 6989.5, Diff: 25.3, Sum: 69730.4]
 [Times: user=0.55 sys=46.58, real=7.02 secs]

This means it spends most of the time in the kernel when doing something that should mostly consist of memory accesses and not system calls. So swap activity or transparent huge pages are likely suspects.

177

answered Oct 24 '22 19:10

the8472

Related questions
                            
                                Obfuscating WAR file with Proguard
                            
                                How to cancel AsyncRestTemplate HTTP request if they are taking too much time?
                            
                                Java Thread.sleep() on Windows 10 stops in S3 sleep status
                            
                                Error creating bean with name 'liquibase' defined in class path resource ... /config/DatabaseConfiguration.class
                            
                                @CompoundIndex not working in Spring Data MongoDB
                            
                                Why do heap memory usage and number of loaded classes keep increasing?
                            
                                Is throwing an exception inside a finally block a performance issue?
                            
                                Hibernate trying to persist same object twice
                            
                                Is Apache Kafka able to handle transactions?
                            
                                Specifying an exception-specific backoff policy with Spring-Retry
                            
                                Optimizing Opportunities with Java Streams
                            
                                Using Java Void type in Kotlin
                            
                                How to initialize the data for each invocation in JMH?
                            
                                When two interfaces have conflicting return types, why does one method become default?
                            
                                Make the java compiler warn when an annotated method is used (like @deprecated)
                            
                                Android AutoCompleteTextView popup moving after displaying
                            
                                What's the difference between a Key and a KeySpec?
                            
                                What are Thread Groups in Java?
                            
                                How to make custom implementation of Retrofit2.Call<T>
                            
                                How to look up and update the state of a record from a database in Apache Flink?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Java garbage collector G1GC taking long for 'Object Copy' (Evacuation Pause)

Tags:

java

garbage-collection

Karussell

People also ask

1 Answers

the8472

Recent Activity

Donate For Us