Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Best Practice regarding clearing a Linked List

I am coding a data structure Linked list in java(For my learning sake I am not using any standard java libraries) and I want to clear the data structure via null out the references. Please suggest me which approach is better

1) Just null the start reference of the list and that will suffice.

2) Apart from nulling out the start , I de-reference all next pointers of all internal nodes to null.Does this help the garbage collector in any way?

The confusion in I see approach 2 is followed in JDK for LinkedList implementation.But i don't see the same for TreeMap

I am using JDK 8

like image 531
Debapriya Biswas Avatar asked Oct 18 '22 21:10

Debapriya Biswas


1 Answers

This is an interesting question, and the answer has a long history with subtle tradeoffs.

The short answer is, clearing the references is not necessary for the correct operation of the data structure. Since you're learning data structures, I'd suggest that you not worry about this issue in your own implementation. My hunch (though I haven't benchmarked this) is that any benefit that might accrue from clearing all the link nodes will rarely be noticeable under typical conditions.

(In addition, it's likely that under typical conditions, LinkedList will be outperformed by ArrayList or ArrayDeque. There are benchmarks that illustrate this. It's not too difficult to come up with workloads where LinkedList outperforms the others, but it's rarer than people think.)

I was quite surprised to learn that the clear operation of LinkedList unlinks all the nodes from each other and from the contained element. Here's a link to the code in JDK 8. This change dates back to 2003, and the change appeared in JDK 5. This change was tracked by the bug JDK-4863813. That change (or a slightly earlier one) clears the next and previous references from individual nodes when they're unlinked from the list. There's also a test case in that bug report that's of some interest.

The problem seems to be that it is possible to make changes to the LinkedList, which creates garbage nodes, faster than the garbage collector can reclaim them. This eventually causes the JVM to run out of memory. The fact that the garbage nodes are all linked together also seems to have the effect of impeding the garbage collector, making it easier for the mutator threads to outstrip the collector threads. (It's not clear to me how important it is to have multiple threads mutating the list. In the test case they all synchronize on the list, so there's no actual parallelism.) The change to LinkedList to unlink the nodes from each other makes it easier for the collector to do its work, and so apparently makes the test no longer run out of memory.

Fast forward to 2009, when the LinkedList code was given a "facelift." This was tracked by bug JDK-6897553 and discussed in this email review thread. One of the original motivations for the "facelift" was to reduce the clear() operation from O(n) to O(1), as unlinking all the nodes seemed unnecessary to that bug's submitter. (It certainly seemed unnecessary to me!) But after some discussion, it was decided that the unlinking behavior provided enough benefit to the garbage collector to retain it and to document it.

The comment also says that unlinking the nodes

is sure to free memory even if there is a reachable Iterator

This refers to a somewhat pathological case like the following:

// fields in some class
List<Obj> list = createAndPopulateALinkedList();
Iterator<Object> iterator;

void someMethod() {
    iterator = list.iterator();
    // ...
    list.clear();
}

The iterator points to one of the linked list's nodes. Even though the list has been cleared, the iterator still keeps a node alive, and since that node has next and previous references, all of the nodes formerly in the list are still alive. Unlinking all the nodes in clear() lets these be collected. I think this is pretty pathological, though, since it's rare for an iterator to be stored in a field. Usually iterators are created, used, and discarded within a single method, most often within a single for loop.

Now, regarding TreeMap. I don't think there's a fundamental reason why LinkedList unlinks its nodes whereas TreeMap does not. One might like to think that the entire JDK code base is maintained consistently, so that if it's good practice for LinkedList to unlink its nodes, this also ought to have been done to TreeMap. Alas, this is not the case. Most likely what happened is that a customer ran into the pathological behavior with LinkedList and the change was made there, but nobody has ever observed similar behavior with TreeMap. Thus there was no impetus to update TreeMap.

like image 111
Stuart Marks Avatar answered Oct 21 '22 14:10

Stuart Marks