Imagine that I define a class with dozens of reference fields (instead of using reference arrays such as Object[]
), and instantiate this class pretty heavily in an application.
Is it going to affect the performance of garbage collector in Hotspot JVM, when it traverses the heap to calculate reachable objects? Or, maybe, it would lead to significant extra memory consumption, for some JVM's internal data structures or class metadata? Or, is it going to affect the efficiency of an application in some other way?
Are those aspects specific to each garbage collector algorithm in Hotspot, or those parts of Hotspot's mechanics are shared and used by all garbage collectors alike?
The more live objects are found, the longer the suspension, which has a direct impact on response time and throughput. This fundamental tenet of garbage collection and the resulting effect on application execution is called the garbage-collection pause or GC pause time.
The most common performance problem associated with Java™ relates to the garbage collection mechanism. If the size of the Java heap is too large, the heap must reside outside main memory. This causes increased paging activity, which affects Java performance.
If your application's object creation rate is very high, then to keep up with it, the garbage collection rate will also be very high. A high garbage collection rate will increase the GC pause time as well. Thus, optimizing the application to create fewer objects is THE EFFECTIVE strategy to reduce long GC pauses.
The Old Generation is used to store long surviving objects. Typically, a threshold is set for young generation object and when that age is met, the object gets moved to the old generation. Eventually the old generation needs to be collected. This event is called a major garbage collection.
Let me rephrase the question. "Is it better to have class A or class B, below?"
class A {
Target[] array;
}
class B {
Target a, b, c, ..., z;
}
The usual maintainability issues notwithstanding... From VM side of view, given the resolved reference to class B, it requires one dereference to reach Target field. While in class A, it requires two derferences, because we also need to read through the array.
The handling of object references in two cases is subtly different: in class A, VM knows there is an contiguous array of references, and so it does not need to know anything else. In class B, VM has to know which fields are references (because there could be non-reference fields, for example), which requires maintaining the oop maps in the class metadata:
// InstanceKlass embedded field layout (after declared fields):
...
// [EMBEDDED nonstatic oop-map blocks] size in words = nonstatic_oop_map_size
// The embedded nonstatic oop-map blocks are short pairs (offset, length)
// indicating where oops are located in instances of this
Note that while footprint overhead is there, it is unlikely to matter very much, unless you have lots of classes of this weird shape, but even then the cost would be per-class, not per-instance.
Oop-maps are built during class parsing, by the shared runtime code. The visitors that walk the "oop"-s for the particular object looks into those oop-maps to find the offsets for references, and that code is also the part of shared runtime. So, this overhead is independent of GC implementation.
Considerations for performance:
So, it probably makes little sense to ask about the difference in GC/runtime handling of separate fields vs arrays. Taking care of locality of reference quite probably gives a bigger bang for the buck. Which tips the scale to class B, with associated maintainability overheads -- as quite a few performance tricks do.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With