Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the reason behind Enum.hashCode()?

Tags:

java

enums

hash

The method hashCode() in class Enum is final and defined as super.hashCode(), which means it returns a number based on the address of the instance, which is a random number from programmers POV.

Defining it e.g. as ordinal() ^ getClass().getName().hashCode() would be deterministic across different JVMs. It would even work a bit better, since the least significant bits would "change as much as possible", e.g., for an enum containing up to 16 elements and a HashMap of size 16, there'd be for sure no collisions (sure, using an EnumMap is better, but sometimes not possible, e.g. there's no ConcurrentEnumMap). With the current definition you have no such guarantee, have you?

Summary of the answers

Using Object.hashCode() compares to a nicer hashCode like the one above as follows:

  • PROS
    • simplicity
  • CONTRAS
    • speed
    • more collisions (for any size of a HashMap)
    • non-determinism, which propagates to other objects making them unusable for
      • deterministic simulations
      • ETag computation
      • hunting down bugs depending e.g. on a HashSet iteration order

I'd personally prefer the nicer hashCode, but IMHO no reason weights much, maybe except for the speed.

UPDATE

I was curious about the speed and wrote a benchmark with surprising results. For a price of a single field per class you can a deterministic hash code which is nearly four times faster. Storing the hash code in each field would be even faster, although negligibly.

The explanation why the standard hash code is not much faster is that it can't be the object's address as objects gets moved by the GC.

UPDATE 2

There are some strange things going on with the hashCode performance in general. When I understand them, there's still the open question, why System.identityHashCode (reading from the object header) is way slower than accessing a normal object field.

like image 399
maaartinus Avatar asked Feb 03 '11 10:02

maaartinus


People also ask

What is the purpose of the hashCode () method?

The purpose of the hashCode() method is to provide a numeric representation of an object's contents so as to provide an alternate mechanism to loosely identify it. By default the hashCode() returns an integer that represents the internal memory address of the object.

What is the valid use of hashCode?

hashCode() method is used to return integer representation of the object from memory address. It returns unique random integer for each instance and integer value won't remain same for several execution of application.

Is hashCode always unique in Java?

Hashcode is a unique code generated by the JVM at time of object creation. It can be used to perform some operation on hashing related algorithms like hashtable, hashmap etc. An object can also be searched with this unique code.


1 Answers

The only reason for using Object's hashCode() and for making it final I can imagine, is to make me ask this question.

First of all, you should not rely on such mechanisms for sharing objects between JVMs. That's simply not a supported use case. When you serialize / deserialize you should rely on your own comparison mechanisms or only "compare" the results against objects within your own JVM.

The reason for letting enums hashCode be implemented as Objects hash code (based on identity) is because, within one JVM there will only be one instance of each enum object. This is enough to ensure that such implementation makes sense and is correct.

You could argue like "Hey, String and the wrappers for the primitives (Long, Integer, ...) all have well defined, deterministic, specifications of hashCode! Why doesn't the enums have it?", Well, to begin with, you can have several distinct string references representing the same string which means that using super.hashCode would be an error, so these classes necessarily need their own hashCode implementations. For these core classes it made sense to let them have well-defined deterministic hashCodes.

Why did they choose to solve it like this?

Well, look at the requirements of the hashCode implementation. The main concern is to make sure that each object should return a distinct hash code (unless it is equal to another object). The identity-based approach is super efficient and guarantees this, while your suggestion does not. This requirement is apparently stronger than any "convenience bonus" about easing up on serialization etc.

like image 200
aioobe Avatar answered Sep 28 '22 04:09

aioobe