Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tuning garbage collections for low latency

I'm looking for arguments as to how best to size the young generation (with respect to the old generation) in an environment where low latency is critical.

My own testing tends to show that latency is lowest when the young generation is fairly large (eg. -XX:NewRatio <3), however I cannot reconcile this with the intuition that the larger the young generation the more time it should take to garbage collect.

The application runs on linux 64 bits, jdk 6.

Memory usage is about 50Megabytes of long-lived objects being loaded at startup (=data cache), and from there it's only (many) very short lived objects being created (with average lifespan < 1 milliseconds).

Some garbage collection cycle take more than 10 milliseconds to run... which looks really disproportionate compared with app latency, which is again a few millisecs at max.

like image 227
Eleco Avatar asked May 06 '10 14:05

Eleco


People also ask

How can I speed up my garbage collection?

Short of avoiding garbage collection altogether, there is only one way to make garbage collection faster: ensure that as few objects as possible are reachable during the garbage collection. The fewer objects that are alive, the less there is to be marked. This is the rationale behind the generational heap.

Does garbage collection affect performance?

The most common performance problem associated with Java™ relates to the garbage collection mechanism. If the size of the Java heap is too large, the heap must reside outside main memory. This causes increased paging activity, which affects Java performance.

Which garbage collection algorithm gives best throughput?

Garbage-first (G1) collector. G1 is a server-style collector designed for multiprocessor machines with a large amount of memory. The collector tries to achieve high throughput along with short pause times, while requiring very little tuning.


1 Answers

For an application that generates lots of short lived garbage and nothing long lived then one approach that can work is a big heap with nearly all of it young gen and nearly all of that eden and tenure anything that survives a YG collection more than once.

For example (lets say you had a 32bit jvm)

  • 3072M heap (Xms and Xmn)
  • 128M tenured (i.e. Xmn 2944m)
  • MaxTenuringThreshold=1
  • SurvivorRatio=190 (i.e. each survivor space is 1/192 of the YG)
  • TargetSurvivorRatio=90 (i.e. fill those survivors as much as possible)

The exact params you would use for this setup depend on what the steady state size of your working set is (i.e. how much is alive at the time of each collection). The thinking here obviously goes against the normal heap sizing rules but then you don't have an app that behaves in that way. The thinking is that the app is mostly v short lived garbage and a bit of static data so set the jvm up so that that static data gets into tenured quickly and then have a YG big enough that it doesn't get collected v often thus minimising the frequency of the pauses. You'd need to twiddle knobs repeatedly to work out what a good size is for you & how that balances against the size of the pause you get per collection. You might find shorter but more frequent YG pauses are achieveable for example.

You don't say how long your app runs for but the target here is to have no tenured collections at all for the life of the app. This may be impossible of course but it's worth aiming for.

However it's not just the collection algo that is important in your case, it is where the memory is allocated. The NUMA collector (only compatible with the throughput collector and activated with UseNUMA switch) makes use of the observation that an object is often uses purely by the thread that created it & thus allocates memory accordingly. I'm not sure what it is based on in linux but it uses MPO (memory placement optimisation) on Solaris, some details on one of the GC guys blogs

Since you're using 64bit jvm then make sure you're using CompressedOops as well.

Given that rate of object allocation (possibly some sort of science lib?) and lifetime then you should give some consideration to object reuse. One example of a lib doing this is the javalution StackContext

Finally it's worth noting that GC pauses are not the only STW pauses, you could run with the 6u21 early access build which has some fixes to the PrintGCApplicationStoppedTime and PrintGCApplicationConcurrentTime switches (that effectively print time at a global safepoint and time between those safepoints). You can use the tracesafepointstatistics flag to get some idea of what is causing it to need a safepoint (aka no byte code is being executed by any thread).

like image 149
Matt Avatar answered Oct 18 '22 21:10

Matt