I will try in brief to explain the problem.I work in supply chain domain where we deal with items/products and SKUs.
Say, my entire problem set is 1 million SKUs and I am running an algorithm. Now, my JVM heap size is given as 4 GB.
I can't process all the SKUs in one shot as I will need lot more memory. So, I divide the problem set into smaller batches. Each batch will have all the related SKUs which need to be processed together.
Now, I run several iterations to process the entire data set. Say, if each batch holds approx. 5000 SKUs, I will have 200 iterations/loops. All data for the 5000 SKUs is required till the batch has completed processing. But when the next batch starts, the previous' batch data is not required and hence can be garbage collected.
This is the problem background. Now, coming to the particular performance issue due to GC - Each batch is taking approx 2-3 secs to finish. Now, within this time, GC is unable to free up any objects as all the data is required till the end of processing a particular batch.So, GC is moving all these objects to old Gen (If I look at yourkit profiler, there is hardly anything in the new Gen). So, old gen is growing faster and full GC is needed which is making my program very slow. Is there any way to tune the GC in such a case or may be change my code to do the memory allocation in a different way?
PS - if each batch is very small, I don't see this issue. I believe this is because the GC is able to free up objects quick enough as the batch completes faster and hence not needed to move objects in the old gen.
First Google hit indicates that you can use -XX:NewRatio
to set a larger new generation size relative to the old generation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With