Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are SoftReferences collected by JVMs in practice?

I have two separate caches running in a JVM (one controlled by a third party library) each using soft references. I would prefer for the JVM to clear out my controlled cache before the one controlled by the library. The SoftReference javadoc states:

All soft references to softly-reachable objects are guaranteed to have been cleared before the virtual machine throws an OutOfMemoryError. Otherwise no constraints are placed upon the time at which a soft reference will be cleared or the order in which a set of such references to different objects will be cleared. Virtual machine implementations are, however, encouraged to bias against clearing recently-created or recently-used soft references.

Direct instances of this class may be used to implement simple caches; this class or derived subclasses may also be used in larger data structures to implement more sophisticated caches. As long as the referent of a soft reference is strongly reachable, that is, is actually in use, the soft reference will not be cleared. Thus a sophisticated cache can, for example, prevent its most recently used entries from being discarded by keeping strong referents to those entries, leaving the remaining entries to be discarded at the discretion of the garbage collector.

How do common JVM implementations, especially HotSpot, handle SoftReferences in practice? Do they "bias against clearing recently-created or recently-used soft references" as encouraged to by the spec?

like image 953
Dave L. Avatar asked Mar 29 '12 19:03

Dave L.


1 Answers

Looks like it could be tuneable, but it isn't. The concurrent mark-sweep collector hangs on the default heap's implementation of must_clear_all_soft_refs() which apparently is only true when performing a _last_ditch_collection.

bool GenCollectedHeap::must_clear_all_soft_refs() {
  return _gc_cause == GCCause::_last_ditch_collection;
}

While normal handling of failed allocation has three successive calls to the heap's do_collect method, in the CollectorPolicy.cpp

HeapWord* GenCollectorPolicy::satisfy_failed_allocation(size_t size,
                                                    bool   is_tlab) {

Which tries to collect, tries to reallocate, tries to expand the heap if that fails, and then as a last-ditch effort, tries to collect clearing soft references.

The comment on the last collection is quite telling (and the only one that triggers clearing soft refs)

  // If we reach this point, we're really out of memory. Try every trick
  // we can to reclaim memory. Force collection of soft references. Force
  // a complete compaction of the heap. Any additional methods for finding
  // free memory should be here, especially if they are expensive. If this
  // attempt fails, an OOM exception will be thrown.
  {
    IntFlagSetting flag_change(MarkSweepAlwaysCompactCount, 1); // Make sure the heap is fully compacted

    gch->do_collection(true             /* full */,
                       true             /* clear_all_soft_refs */,
                       size             /* size */,
                       is_tlab          /* is_tlab */,
                       number_of_generations() - 1 /* max_level */);
  }

--- Edited in response to the obvious, I was describing weak references, not soft ones ---

In practice, I would imagine that SoftReferences are only "not" followed when the JVM is called for garbage collection in response to they attempt to avoid an OutOfMemoryError.

For SoftReferences to be compatible with all four Java 1.4 garbage collectors, and with the new G1 collector, the decision must lie only with the reachability determination. By the time that reaping and compacting occur, it is far too late to decide if an object is reachable. This suggests (but does not require) that a collection "context" exists which determines reachability based on free memory availability in the heap. Such a context would have to indicate not following SoftReferences prior to attempting to follow them.

Since OutOfMemoryError avoidance garbage collection is specially scheduled in a full-collection, stop-the-world manner, it would not be a hard to imagine scenario where the heap manager sets a "don't follow SoftReference" flag before the collection occurs.

--- Ok, so I decided that a "must work this way" answer just wasn't good enough ---

From the source code src/share/vm/gc_implementation/concurrentMarkSweep/vmCMSOperations.cpp (highlights are mine)

The operation to actually "do" garbage collection:

  170 void VM_GenCollectFullConcurrent::doit() {

We better be a VM thread, otherwise a "program" thread is garbage collecting!

  171   assert(Thread::current()->is_VM_thread(), "Should be VM thread");

We are a concurrent collector, so we better be scheduled concurrently!

  172   assert(GCLockerInvokesConcurrent || ExplicitGCInvokesConcurrent, "Unexpected");
  173 

Grab the heap (which has the GCCause object in it).

  174   GenCollectedHeap* gch = GenCollectedHeap::heap();

Check to see if we need a foreground "young" collection

  175   if (_gc_count_before == gch->total_collections()) {
  176     // The "full" of do_full_collection call below "forces"
  177     // a collection; the second arg, 0, below ensures that
  178     // only the young gen is collected. XXX In the future,
  179     // we'll probably need to have something in this interface
  180     // to say do this only if we are sure we will not bail
  181     // out to a full collection in this attempt, but that's
  182     // for the future.

Are the program threads not meddling with the heap?

  183     assert(SafepointSynchronize::is_at_safepoint(),
  184       "We can only be executing this arm of if at a safepoint");

Fetch the garbage collection cause (the reason for this collection) from the heap.

  185     GCCauseSetter gccs(gch, _gc_cause);

Do a full collection of the young space

Note that his passes in the value of the heap's must_clear_all_soft_refs flag Which in an OutOfMemory scenario must have been set to true, and in either case directs the "do_full_collection" to no follow the soft references

  186     gch->do_full_collection(gch->must_clear_all_soft_refs(),
  187                             0 /* collect only youngest gen */);

The _gc_cause is an enum, which is (guesswork here) set to _allocation_failure in the first attempt at avoiding OutOfMemoryError and _last_ditch_collection after that fails (to attempt to collect transient garbage)

A quick look in the memory "heap" module shows that in do_full_collection which calls do_collection soft references are cleared explicitly (under the "right" conditions) with the line

  480   ClearedAllSoftRefs casr(do_clear_all_soft_refs, collector_policy());

--- Original post follows for those who want to learn about weak references ---

In the Mark and Sweep algorithm, Soft references are not followed from the Main thread (and thus not marked unless a different branch could reach it through non-soft references.

In the copy algorithm, Objects soft references point to are not copied (again unless they are reached by a different non-soft reference).

Basically, when following the web of references from the "main" thread of execution, soft references are not followed. This allows their objects to be garbage collected just as if they didn't have references pointing to them.

It is important to mention that soft references are almost never used in isolation. They are typically used in objects where the design is to have multiple references to the object, but only one reference need be cleared to trigger garbage collection (for ease of maintaining the container, or run time performance of not needing to look up expensive references).

like image 116
Edwin Buck Avatar answered Sep 22 '22 04:09

Edwin Buck