Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What causes the slightly unpredictable ordering of the iterator() for the java.util.HashSet and HashMap.keySet() classes?

Six years ago, I burned several days trying to hunt down where my perfectly deterministic framework was responding randomly. After meticulously chasing the entire framework ensuring that it was all using the same instance of Random, I then kept chasing by single stepping code. It was highly repetitive iterative self-calling code. Worse, the damn effect would only show up after a huge number of iterations were completed. And after +6 hours, I was finally at wits end when I discovered a line in the javadoc for HashSet.iterator() indicating it doesn't guarantee the order in which it will return elements. I then went through my entire code base and replaced all instances of HashSet with LinkedHashSet. And low-and-behold, my framework sprang right to deterministic life! ARGH!

I have now just experienced this same FREAKIN affect, again (at least it was only 3 hours this time). For whatever reason, I missed the small detail that HashMap happens to BEHAVE THE SAME WAY for its keySet().

Here's an SO thread on this subject, although the discussion never quite answers my question: Iteration order of HashSet

So, I am curious as to why this might occur. Given both times I had a huge single threaded java application crawling through exactly the same instantiation/insertion space with exactly the same JVM parameters (multiple runs from the same batch file) on the same computer with almost nothing else running, what could possibly perturb the JVM such that HashSet and HashMap would, after an enormous number of iterations, behave unpredictably (not inconsistenly as the javadoc says not to depend upon the order)?

Any ideas around this from either the source code (implementation of these classes in java.util) or from your knowledge of the JVM (perhaps some GC affect where internal java classes get non-zeroed memory when allocating internal memory spaces)?

like image 754
chaotic3quilibrium Avatar asked Dec 11 '10 20:12

chaotic3quilibrium


1 Answers

I've struck this before, where the order wasn't important, but did affect the results.

The multi-threaded nature of Java means that repeated runs with exactly the same inputs can be affected by slight timing differences in (for example) how long it takes to allocate a new block of memory, which might sometimes require paging out to disk the previous contents, and at other times that isn't needed. Some other thread not using that page may proceed, and you could end up with a different order of object creation, when System objects are taken into account.

That can affect the Object.hashCode() result for the equivalent object in different runs of the JVM.

For me, I decided to add the small overhead of using a LinkedHashMap, in order to be able to reproduce the results of the tests I was running.

like image 136
Stephen Denne Avatar answered Oct 04 '22 22:10

Stephen Denne