Reducing JVM pause time > 1 second using UseConcMarkSweepGC

Tags:

I'm running a memory intensive app on a machine with 16Gb of RAM, and an 8-core processor, and Java 1.6 all running on CentOS release 5.2 (Final). Exact JVM details are:

java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)

I'm launching the app with the following command line options:

java -XX:+UseConcMarkSweepGC -verbose:gc -server -Xmx10g -Xms10g ...

My application exposes a JSON-RPC API, and my goal is to respond to requests within 25ms. Unfortunately, I'm seeing delays up to and exceeding 1 second and it appears to be caused by garbage collection. Here are some of the longer examples:

[GC 4592788K->4462162K(10468736K), 1.3606660 secs]
[GC 5881547K->5768559K(10468736K), 1.2559860 secs]
[GC 6045823K->5914115K(10468736K), 1.3250050 secs]

Each of these garbage collection events was accompanied by a delayed API response of very similar duration to the length of the garbage collection shown (to within a few ms).

Here are some typical examples (these were all produced within a few seconds):

[GC 3373764K->3336654K(10468736K), 0.6677560 secs]
[GC 3472974K->3427592K(10468736K), 0.5059650 secs]
[GC 3563912K->3517273K(10468736K), 0.6844440 secs]
[GC 3622292K->3589011K(10468736K), 0.4528480 secs]

The thing is that I thought the UseConcMarkSweepGC would avoid this, or at least make it extremely rare. On the contrary, delays exceeding 100ms are occurring almost once a minute or more (although delays of over 1 second are considerably rarer, perhaps once every 10 or 15 minutes).

The other thing is that I thought only a FULL GC would cause threads to be paused, yet these don't appear to be full GCs.

It may be relevant to note that most of the memory is occupied by a LRU memory cache that makes use of soft references.

Any assistance or advice would be greatly appreciated.

425

asked Feb 22 '09 00:02

sanity

4 Answers

First, check out the Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning documentation, if you haven't already done so. This documentation says:

the concurrent collector does most of its tracing and sweeping work with the application threads still running, so only brief pauses are seen by the application threads. However, if the concurrent collector is unable to finish reclaiming the unreachable objects before the tenured generation fills up, or if an allocation cannot be satisfied with the available free space blocks in the tenured generation, then the application is paused and the collection is completed with all the application threads stopped. The inability to complete a collection concurrently is referred to as concurrent mode failure and indicates the need to adjust the concurrent collector parameters.

and a little bit later on...

The concurrent collector pauses an application twice during a concurrent collection cycle.

I notice that those GCs don't seem to be freeing very much memory. Perhaps many of your objects are long lived? You may wish to tune the generation sizes and other GC parameters. 10 Gig is a huge heap by many standards, and I would naively expect GC to take longer with such a huge heap. Still, 1 second is a very long pause time and indicates either something is wrong (your program is generating a large number of unneeded objects or is generating difficult-to-reclaim objects, or something else) or you just need to tune the GC.

Usually, I would tell someone that if they have to tune GC then they have other problems they need to fix first. But with an application of this size, I think you fall into the territory of "needing to understand GC much more than the average programmer."

As others have said, you need to profile your application to see where the bottleneck is. Is your PermGen too large for the space allocated to it? Are you creating unnecessary objects? jconsole works to at least show a minimum of information about the VM. It's a starting point. As others have indicated however, you very likely need more advanced tools than this.

Good luck.

187

answered Oct 05 '22 17:10

Eddie

Since you mention your desire to cache, I'm guessing that most of your huge heap is occupied by that cache. You might want to limit the size of the cache so that you are sure it never attempts to grow large enough to fill the tenured generation. Don't rely on SoftReference alone to limit the size. As the old generation fills with soft references, older references will be cleared and become garbage. New references (perhaps to the same information) will be created, but cleared quickly because free space is in short supply. Eventually, the tenured space is full of garbage and needs to be cleaned.

Consider adjusting the -XX:NewRatio setting too. The default is 1:2, meaning that one-third of the heap is allocated to the new generation. For a large heap, this is almost always too much. You might want to try something like 9, which would keep 9 Gb of your 10 Gb heap for the old generation.

answered Oct 05 '22 17:10

erickson

Turns out that part of the heap was getting swapped out to disk, so that garbage collection had to pull a bunch of data off the disk back into memory.

I resolved this by setting Linux's "swappiness" parameter to 0 (so that it wouldn't swap data out to disk).

answered Oct 05 '22 15:10

sanity

Here are some things I have found which could be significant.

JSON-RPC can generate a lot of objects. Not as much as XML-RPC, but still something to watch for. In any case you do appear to be generating as much at 100 MB of objects per second which means your GC is running a high percentage of the time and is likely to be adding to your random latency. Even though the GC is concurrent, your hardware/OS is very likely to exhibit non-ideal random latency under load.
Have a look at your memory bank architecture. On Linux the command is numactl --hardware. If your VM is being split across more than one memory bank this will increase your GC times significantly. (It will also slow down your application as these accesses can be significantly less efficient) The harder you work the memory subsystem the more likely the OS will have to shift memory around (Often in large amounts) and you get dramatic pauses as a result (100 ms is not surprising). Don't forget your OS does more than just run your app.
Consider compacting/reducing the memory consumption of your cache. If you are using multiple GB of cache it is worth looking at ways to cut memory consumption further than you have already.
I suggest you profile your app with memory allocation tracing AND cpu sampling on at the same time. This can yield very different results and often points to the cause of these sort of problems.

Using these approaches, the latency of an RPC call can be reduced to below 200 micro-second and the GC times reduced to 1-3 ms effecting less than 1/300 of calls.

answered Oct 05 '22 16:10

Peter Lawrey

Related questions
                            
                                how to define static fields for type parameter in java generics
                            
                                Why does Android Studio want me to use For Each instead of For Loop?
                            
                                Why does Android ignore READ_SMS permission?
                            
                                Java same enum with different values
                            
                                Insert object to specific position in jsonarray
                            
                                check for 4xx Http Code with native Java
                            
                                Localdate: format with locale
                            
                                Why 'condition && statement()' is not popular in Java world? [closed]
                            
                                Android Camera2 API YUV_420_888 to JPEG
                            
                                Spring RestTemplate streaming response into another request
                            
                                Post Increment Infinite loop i += i++; [duplicate]
                            
                                How can I solve "org.json.simple.JSONObject cannot be resolved"?
                            
                                Java Streams: find if stream contains null
                            
                                Java: Unmapped target properties
                            
                                how to deploy reactJS application on Tomcat Server
                            
                                Parse date or datetime both as LocalDateTime in Java 8
                            
                                Using Sqlite JDBC Driver in a Gradle Java Project
                            
                                Jackson cannot convert empty string value to enum
                            
                                Type mismatch: Required: Context, Found: Intent
                            
                                Inheritance in Java - "Cannot find symbol constructor"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Reducing JVM pause time > 1 second using UseConcMarkSweepGC

Tags:

java

performance

garbage-collection

jvm

jvm-arguments

sanity

People also ask

4 Answers

Eddie

erickson

sanity

Peter Lawrey

Recent Activity

Donate For Us