Some hash table schemes, such as cuckoo hashing or dynamic perfect hashing, rely on the existence of universal hash functions and the ability to take a collection of data exhibiting collisions and resolve those collisions by picking a new hash function from the family of universal hash functions.
A while ago I was trying to implement a hash table in Java backed by cuckoo hashing and ran into trouble because while all Java objects have a hashCode
function, the value that hashCode
returns is fixed for each object (unless, of course, the objects change). This means that without the user providing an external family of universal hash functions, it's impossible to build a hash table that relies on universal hashing.
Inititially I thought that I could get around this by applying a universal hash function to the object's hashCode
s directly, but this doesn't work because if two objects have the same hashCode
, then any deterministic function you apply to those hash codes, even a randomly-chosen hash function, will result in the same value and thus cause a collision.
It seems like this would be detrimental to Java's design. It means that HashMap
and other hash containers are completely prohibited from using tables based on universal hashing, even if the language designers may think that such tables would be appropriate in the language design. It also makes it harder for third-party library designers to build hash tables of this sort as well.
My question is: is there a reason that Java opted to design hashCode
without considering the possibility of hashing objects with multiple hash functions? I understand that many good hashing schemes like chained hashing or quadratic probing don't require it, but it seems as though the decision makes it hard to use certain classes of algorithms on Java objects.
hashCode() specifies how a String's hash code is computed, it is deterministic.
On strings, numbers and collection classes, hashCode() always returns a consistent value, apparently even across different JVM vendors.
String and primitive types have a stable hashCode generation in java. Similarly, hashCode works well for collections and arrays have a warning in intellij on data class.
Returns the same hash code for the given object as would be returned by the default method hashCode(), whether or not the given object's class overrides hashCode(). The hash code for the null reference is zero. So even if the hashcode() is overridden, it should not effect it.
Simplicity. Java allows class designers to provide their own hashCode
, which as you mention is good enough for "ordinary" hash tables, and can be hard enough to understand.
Besides, when the Java Collections API was designed, having generic hash tables in the standard library was bold enough a move already. C has never had them. C++ had them in the STL as hash_set
and hash_map
, but those didn't make it into the standard. Only now, in C++0x, are hash tables being considered for standardization again.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With