Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java String hashcode caching

One of the advantages of String immutability is hashcode caching for faster access.

  • In this case how cache is handled for string which has same hashcode?
  • Does it really improve the performance in this case?
like image 719
Sivasubramaniam Arunachalam Avatar asked May 14 '12 02:05

Sivasubramaniam Arunachalam


4 Answers

In this case how cache is handled for string which has same hashcode?

What is being cached is the hashcode of the string. It is cached in a private int field in the string itself. It doesn't make any difference that different Strings may have the same hashcode ... because the hashcode is stored in the respective String objects.

(The thing that matters most is that two strings that have the same sequence of characters (and hence are equal) have the same hashcode value. And that is guaranteed because the hashcode algorithm for Java strings is standardized ... and has this property.)

Does it really improve the performance in this case?

On average, yes, and more so as the string lengths get larger.

A decent string hashcode algorithm needs to look at every character in the string ... otherwise similar strings can end up systematically mapping to the same hashcode (and that is BAD). Avoiding lookiing at those N characters multiple times is a big win.

The only significant cases where caching wouldn't help would be:

  • when most string hashcodes are used once only, or
  • when most strings are really short.

(There's one more really obscure case. If a String hashes to 0, then caching will be ineffective. This is because the String class uses 0 in the cache field to say that the hashcode hasn't been cached.)

like image 179
Stephen C Avatar answered Oct 02 '22 21:10

Stephen C


In this case how cache is handled for string which has same hashcode?

I don't understand the first part of your question. Cache is handled for all Strings the same, whether hashcodes are the same or not (since two different Strings can theoretically have the same hashCode, and so if hashCodes are equal, it doesn't mean that the Strings are equal). But if the same String object is used, the hashCode does not have to be recalculated since it is cached.

Does it really improve the performance?

Unequivocally YES

like image 39
Hovercraft Full Of Eels Avatar answered Oct 02 '22 22:10

Hovercraft Full Of Eels


The cache is just an int field within the String object. Multiple Strings can have the same hashcode without issue.

It significantly helps performance because:

  • Calculating a hashcode is much more expensive than reading a single int field
  • If you calculate the hashcode of a string once, it is likely that you will want to calculate the hashcode of the String many more times (e.g. it is being used in a hashmap key)

If you are interested, worth taking a look at the source:

http://www.docjar.com/html/api/java/lang/String.java.html

like image 43
mikera Avatar answered Oct 02 '22 21:10

mikera


In most cases, the hashCode is not calculated until you attempt to put the string to a HashMap. The map then caches it in Map.Entry anyway to speed up comparisons and rehashing.

like image 26
Zdeněk Pavlas Avatar answered Oct 02 '22 22:10

Zdeněk Pavlas