Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is String.hashCode() portable across VMs, JDKs and OSs?

Tags:

java

hashcode

An interesting issue came up recently. We came across some code that is using hashCode() as a salt source for MD5 encryption but this raises the question: will hashCode() return the same value for the same object on different VMs, different JDK versions and operating systems? Even if its not guaranteed, has it changed at any point up til now?

EDIT: I really mean String.hashCode() rather than the more general Object.hashCode(), which of course can be overridden.

like image 755
cletus Avatar asked Oct 10 '08 07:10

cletus


People also ask

Is hashCode of string always same?

A code hash function always returns the unique hash value for every String value. The hashCode() method is the inherited method from the Object class in the String class that is used for returning the hash value of a particular value of the String type.

Is string hashCode consistent?

On strings, numbers and collection classes, hashCode() always returns a consistent value, apparently even across different JVM vendors.

How do I find the hashCode of a string?

The hashCode() method returns the hash code of a string. where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation.


2 Answers

No. From http://tecfa.unige.ch/guides/java/langspec-1.0/javalang.doc1.html:

The general contract of hashCode is as follows:

  • Whenever it is invoked on the same object more than once during an execution of a Java application, hashCode must consistently return the same integer. The integer may be positive, negative, or zero. This integer does not, however, have to remain consistent from one Java application to another, or from one execution of an application to another execution of the same application. [...]
like image 113
John Millikin Avatar answered Nov 06 '22 21:11

John Millikin


It depends on the type:

  • If you've got a type which hasn't overridden hashCode() then it will probably return a different hashCode() each time you run the program.
  • If you've got a type which overrides hashCode() but doesn't document how it's calculated, it's perfectly legitimate for an object with the same data to return a different hash on each run, so long as it returns the same hash for repeated calls within the same run.
  • If you've got a type which overrides hashCode() in a documented manner, i.e. the algorithm is part of the documented behaviour, then you're probably safe. (java.lang.String documents this, for example.) However, I'd still steer clear of relying on this on general principle, personally.

Just a cautionary tale from the .NET world: I've seen at least a few people in a world of pain through using the result of string.GetHashCode() as their password hash in a database. The algorithm changed between .NET 1.1 and 2.0, and suddenly all the hashes are "wrong". (Jeffrey Richter documents an almost identical case in CLR via C#.) When a hash does need to be stored, I'd prefer it to be calculated in a way which is always guaranteed to be stable - e.g. MD5 or a custom interface implemented by your types with a guarantee of stability.

like image 22
Jon Skeet Avatar answered Nov 06 '22 20:11

Jon Skeet