I recently raised a question in stackoverflow, then found the answer. The initial question was What mechanisms other than mutexs or garbage collection can slow my multi-threaded java program?
I discovered to my horror that HashMap has been modifed between JDK1.6 and JDK1.7. It now has a block of code that causes all threads creating HashMaps to synchronize.
The line of code in JDK1.7.0_10 is
/**A randomizing value associated with this instance that is applied to hash code of keys to make hash collisions harder to find. */
transient final int hashSeed = sun.misc.Hashing.randomHashSeed(this);
Which ends up calling
protected int next(int bits) {
long oldseed, nextseed;
AtomicLong seed = this.seed;
do {
oldseed = seed.get();
nextseed = (oldseed * multiplier + addend) & mask;
} while (!seed.compareAndSet(oldseed, nextseed));
return (int)(nextseed >>> (48 - bits));
}
Looking in other JDKs, I find this isn't present in JDK1.5.0_22, or JDK1.6.0_26.
The impact on my code is huge. It makes it so that when I run on 64 threads, I get less performance than when I run on 1 thread. A JStack shows that most of the threads are spending most of their time spinning in that loop in Random.
So I seem to have some options:
Before I start down any of these paths (all look very time consuming and potentially high impact), I wondered if I have missed an obvious trick. Can any of you stack overflow people suggest which is the better path, or perhaps identify a new idea.
Thanks for the help
I am the original author of the patch which appeared in 7u6, CR#7118743 : Alternative Hashing for String with Hash-based Maps.
I'll acknowledge right up front that the initialization of hashSeed is a bottleneck but it is not one we expected to be a problem since it only happens once per Hash Map instance. For this code to be a bottleneck you would have to be creating hundreds or thousands of hash maps per second. This is certainly not typical. Is there really a valid reason for your application to be doing this? How long do these hash maps live?
Regardless, we will probably investigate switching to ThreadLocalRandom rather than Random and possibly some variant of lazy initialization as suggested by cambecc.
EDIT 3
A fix for the bottleneck was been pushed to the JDK7 update mercurial repo:
http://hg.openjdk.java.net/jdk7u/jdk7u-dev/jdk/rev/b03bbdef3a88
The fix will be part of the upcoming 7u40 release and is already available in IcedTea 2.4 releases.
Near final test builds of 7u40 are available here:
https://jdk7.java.net/download.html
Feedback is still welcomed. Send it to http://mail.openjdk.java.net/mailman/listinfo/core-libs-dev to be sure it gets seen by the openJDK devs.
This looks like a "bug" you can work around. There is a property that disables the new "alternative hashing" feature:
jdk.map.althashing.threshold = -1
However, disabling alternative hashing is not sufficient because it does not turn off the generation of a random hash seed (though it really should). So even if you turn off alt hashing, you still have thread contention during hash map instantiation.
One particularly nasty way of working around this is to forcefully replace the instance of Random
used for hash seed generation with your own non-synchronized version:
// Create an instance of "Random" having no thread synchronization.
Random alwaysOne = new Random() {
@Override
protected int next(int bits) {
return 1;
}
};
// Get a handle to the static final field sun.misc.Hashing.Holder.SEED_MAKER
Class<?> clazz = Class.forName("sun.misc.Hashing$Holder");
Field field = clazz.getDeclaredField("SEED_MAKER");
field.setAccessible(true);
// Convince Java the field is not final.
Field modifiers = Field.class.getDeclaredField("modifiers");
modifiers.setAccessible(true);
modifiers.setInt(field, field.getModifiers() & ~Modifier.FINAL);
// Set our custom instance of Random into the field.
field.set(null, alwaysOne);
Why is it (probably) safe to do this? Because alt hashing has been disabled, causing the random hash seeds to be ignored. So it doesn't matter that our instance of Random
isn't in fact random. As always with nasty hacks like this, please use with caution.
(Thanks to https://stackoverflow.com/a/3301720/1899721 for the code that sets static final fields).
FWIW, the following change to HashMap
would eliminate the thread contention when alt hashing is disabled:
- transient final int hashSeed = sun.misc.Hashing.randomHashSeed(this);
+ transient final int hashSeed;
...
useAltHashing = sun.misc.VM.isBooted() &&
(capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
+ hashSeed = useAltHashing ? sun.misc.Hashing.randomHashSeed(this) : 0;
init();
A similar approach can be used for ConcurrentHashMap
, etc.
There are lots of apps out there that create a transient HashMap per record in big data applications. This parsers and serializers, for example. Putting any synchronization into unsynchronized collections classes is a real gotcha. In my opinion, this is unacceptable and needs to be fixed ASAP. The change that was apparently introduced in 7u6, CR#7118743 should be reverted or fixed without requiring any synchronization or atomic operation.
Somehow this is reminds me of the colossal mistake of making StringBuffer and Vector and HashTable synchronized in JDK 1.1/1.2. People paid dearly for years for that mistake. No need to repeat that experience.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With