The method hashCode() in class Enum is final and defined as super.hashCode(), which means it returns a number based on the address of the instance, which is a random number from programmers POV. Defining it e.g. as <code>ordinal() ^ getClass().getName().hashCode()</code> would be deterministic across different JVMs. It would even work a bit better, since the least significant bits would "change as much as possible", e.g., for an enum containing up to 16 elements and a HashMap of size 16, there'd be for sure no collisions (sure, using an EnumMap is better, but sometimes not possible, e.g. there's no ConcurrentEnumMap). With the current definition you have no such guarantee, have you? <h3>Summary of the answers</h3> Using <code>Object.hashCode()</code> compares to a nicer hashCode like the one above as follows: <ul> <li>PROS <ul> <li>simplicity</li> </ul> </li> <li>CONTRAS <ul> <li>speed</li> <li>more collisions (for any size of a HashMap)</li> <li>non-determinism, which propagates to other objects making them unusable for <ul> <li>deterministic simulations</li> <li>ETag computation</li> <li>hunting down bugs depending e.g. on a <code>HashSet</code> iteration order</li> </ul> </li> </ul> </li> </ul> I'd personally prefer the nicer hashCode, but IMHO no reason weights much, maybe except for the speed. <h3>UPDATE</h3> I was curious about the speed and wrote a benchmark with surprising results. For a price of a single field per class you can a deterministic hash code which is nearly four times faster. Storing the hash code in each field would be even faster, although negligibly. <img src="https://i.stack.imgur.com/rtb85.png" alt=""> The explanation why the standard hash code is not much faster is that it can't be the object's address as objects gets moved by the GC. <h3>UPDATE 2</h3> There are some strange things going on with the <code>hashCode</code> performance in general. When I understand them, there's still the open question, why <code>System.identityHashCode</code> (reading from the object header) is way slower than accessing a normal object field.

<blockquote> The only reason for using Object's hashCode() and for making it final I can imagine, is to make me ask this question. </blockquote> First of all, you should not rely on such mechanisms for sharing objects between JVMs. That's simply not a supported use case. When you serialize / deserialize you should rely on your own comparison mechanisms or only "compare" the results against objects within your own JVM. The reason for letting enums <code>hashCode</code> be implemented as <code>Objects</code> hash code (based on identity) is because, within one JVM there will only be one instance of each enum object. This is enough to ensure that such implementation makes sense and is correct. You could argue like "Hey, String and the wrappers for the primitives (Long, Integer, ...) all have well defined, deterministic, specifications of <code>hashCode</code>! Why doesn't the enums have it?", Well, to begin with, you can have several distinct string references representing the same string which means that using <code>super.hashCode</code> would be an error, so these classes necessarily need their own hashCode implementations. For these core classes it made sense to let them have well-defined deterministic hashCodes. <blockquote> Why did they choose to solve it like this? </blockquote> Well, look at the requirements of the <code>hashCode</code> implementation. The main concern is to make sure that each object should return a distinct hash code (unless it is equal to another object). The identity-based approach is super efficient and guarantees this, while your suggestion does not. This requirement is apparently stronger than any "convenience bonus" about easing up on serialization etc.

What is the reason behind Enum.hashCode()?

Tags:

java

enums

hash

The method hashCode() in class Enum is final and defined as super.hashCode(), which means it returns a number based on the address of the instance, which is a random number from programmers POV.

Defining it e.g. as ordinal() ^ getClass().getName().hashCode() would be deterministic across different JVMs. It would even work a bit better, since the least significant bits would "change as much as possible", e.g., for an enum containing up to 16 elements and a HashMap of size 16, there'd be for sure no collisions (sure, using an EnumMap is better, but sometimes not possible, e.g. there's no ConcurrentEnumMap). With the current definition you have no such guarantee, have you?

Summary of the answers

Using Object.hashCode() compares to a nicer hashCode like the one above as follows:

PROS
- simplicity
CONTRAS
- speed
- more collisions (for any size of a HashMap)
- non-determinism, which propagates to other objects making them unusable for
  - deterministic simulations
  - ETag computation
  - hunting down bugs depending e.g. on a HashSet iteration order

I'd personally prefer the nicer hashCode, but IMHO no reason weights much, maybe except for the speed.

UPDATE

I was curious about the speed and wrote a benchmark with surprising results. For a price of a single field per class you can a deterministic hash code which is nearly four times faster. Storing the hash code in each field would be even faster, although negligibly.

The explanation why the standard hash code is not much faster is that it can't be the object's address as objects gets moved by the GC.

UPDATE 2

There are some strange things going on with the hashCode performance in general. When I understand them, there's still the open question, why System.identityHashCode (reading from the object header) is way slower than accessing a normal object field.

399

asked Feb 03 '11 10:02

maaartinus

1 Answers

The only reason for using Object's hashCode() and for making it final I can imagine, is to make me ask this question.

First of all, you should not rely on such mechanisms for sharing objects between JVMs. That's simply not a supported use case. When you serialize / deserialize you should rely on your own comparison mechanisms or only "compare" the results against objects within your own JVM.

The reason for letting enums hashCode be implemented as Objects hash code (based on identity) is because, within one JVM there will only be one instance of each enum object. This is enough to ensure that such implementation makes sense and is correct.

You could argue like "Hey, String and the wrappers for the primitives (Long, Integer, ...) all have well defined, deterministic, specifications of hashCode! Why doesn't the enums have it?", Well, to begin with, you can have several distinct string references representing the same string which means that using super.hashCode would be an error, so these classes necessarily need their own hashCode implementations. For these core classes it made sense to let them have well-defined deterministic hashCodes.

Why did they choose to solve it like this?

Well, look at the requirements of the hashCode implementation. The main concern is to make sure that each object should return a distinct hash code (unless it is equal to another object). The identity-based approach is super efficient and guarantees this, while your suggestion does not. This requirement is apparently stronger than any "convenience bonus" about easing up on serialization etc.

200

answered Sep 28 '22 04:09

aioobe

Related questions
                            
                                Test if file exists
                            
                                Ensure that Spring Quartz job execution doesn't overlap
                            
                                Split string into individual words Java
                            
                                Android SimpleDateFormat, how to use it?
                            
                                Reversing an Array in Java [duplicate]
                            
                                Check line for unprintable characters while reading text file
                            
                                Jodatime start of day and end of day
                            
                                What is a good solution for calculating an average where the sum of all values exceeds a double's limits?
                            
                                Capturing sound from Wine with TargetDataLine
                            
                                How to test remote android aidl service
                            
                                Java applet can't open files under Safari 7 (Mac OS X 10.9)
                            
                                Finite generated Stream in Java - how to create one?
                            
                                Fastest way to sum integers in text file
                            
                                ExecutorService.submit(Task) vs CompletableFuture.supplyAsync(Task, Executor)
                            
                                At what point does wrapping a FileOutputStream with a BufferedOutputStream make sense, in terms of performance?
                            
                                Cassandra Client Java API's [closed]
                            
                                JIT not optimizing loop that involves Integer.MAX_VALUE
                            
                                Tools to detect duplicated code (Java) [closed]
                            
                                Testing against Java EE 6 API
                            
                                Why does JAXB need a no arg constructor for marshalling?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With