Why does HashSet allow equal items if hashcodes are different?

Tags:

The HashSet class has an add(Object o) method, which is not inherited from another class. The Javadoc for that method says the following:

Adds the specified element to this set if it is not already present. More formally, adds the specified element e to this set if this set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false.

In other words, if two objects are equal, then the second object will not be added and the HashSet will remain the same. However, I've discovered that this is not true if objects e and e2 have different hashcodes, despite the fact that e.equals(e2). Here is a simple example:

import java.util.HashSet;
import java.util.Iterator;
import java.util.Random;

public class BadHashCodeClass {

    /**
     * A hashcode that will randomly return an integer, so it is unlikely to be the same
     */
    @Override
    public int hashCode(){
        return new Random().nextInt();
    }

    /**
     * An equal method that will always return true
     */
    @Override
    public boolean equals(Object o){
        return true;
    }

    public static void main(String... args){
        HashSet<BadHashCodeClass> hashSet = new HashSet<>();
        BadHashCodeClass instance = new BadHashCodeClass();
        System.out.println("Instance was added: " + hashSet.add(instance));
        System.out.println("Instance was added: " + hashSet.add(instance));
        System.out.println("Elements in hashSet: " + hashSet.size());

        Iterator<BadHashCodeClass> iterator = hashSet.iterator();
        BadHashCodeClass e = iterator.next();
        BadHashCodeClass e2 = iterator.next();
        System.out.println("Element contains e and e2 such that (e==null ? e2==null : e.equals(e2)): " + (e==null ? e2==null : e.equals(e2)));
    }

The results from the main method are:

Instance was added: true
Instance was added: true
Elements in hashSet: 2
Element contains e and e2 such that (e==null ? e2==null : e.equals(e2)): true

As the example above clearly shows, HashSet was able to add two elements where e.equals(e2).

I'm going to assume that this is not a bug in Java and that there is in fact some perfectly rational explanation for why this is. But I can't figure out what exactly. What am I missing?

747

asked Jan 06 '14 06:01

Thunderforge

1 Answers

I think what you're really trying to ask is:

"Why does a HashSet add objects with inequal hash codes even if they claim to be equal?"

The distinction between my question and the question you posted is that you're assuming this behavior is a bug, and therefore you're getting grief for coming at it from that perspective. I think the other posters have done a thoroughly sufficient job of explaining why this is not a bug, however they have not addressed the underlying question.

I will try to do so here; I would suggest rephrasing your question to remove the accusations of poor documentation / bugs in Java so you can more directly explore why you're running into the behavior you're seeing.

The equals() documentations states (emphasis added):

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

The contract between equals() and hashCode() isn't just an annoying quirk in the Java specification. It provides some very valuable benefits in terms of algorithm optimization. By being able to assume that a.equals(b) implies a.hashCode() == b.hashCode() we can do some basic equivalence tests without needing to call equals() directly. In particular, the invariant above can be turned around - a.hashCode() != b.hashCode() implies a.equals(b) will be false.

If you look at the code for HashMap (which HashSet uses internally), you'll notice an inner static class Entry, defined like so:

static class Entry<K,V> implements Map.Entry<K,V> {
  final K key;
  V value;
  Entry<K,V> next;
  int hash;
  ...
}

HashMap stores the key's hash code along with the key and value. Because a hash code is expected to not change over the time a key is stored in the map (see Map's documentation, "The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.") it is safe for HashMap to cache this value. By doing so, it only needs to call hashCode() once for each key in the map, as opposed to every time the key is inspected.

Now lets look at the implementation of put(), where we see these cached hashes being taken advantage of, along with the invariant above:

public V put(K key, V value) {
  ...
  int hash = hash(key);
  int i = indexFor(hash, table.length);
  for (Entry<K,V> e = table[i]; e != null; e = e.next) {
    Object k;
    if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
      // Replace existing element and return
    }
  }
  // Insert new element
}

In particular, notice that the conditional only ever calls key.equals(k) if the hash codes are equal and the key isn't the exact same object, due to short-circuit evaluation. By the contract of these methods, it should be safe for HashMap to skip this call. If your objects are incorrectly implemented, these assumptions being made by HashMap are no longer true, and you will get back unusable results, including "duplicates" in your set.

Note that your claim "HashSet ... has an add(Object o) method, which is not inherited from another class" is not quite correct. While its parent class, AbstractSet, does not implement this method, the parent interface, Set, does specify the method's contract. The Set interface is not concerned with hashes, only equality, therefore it specifies the behavior of this method in terms of equality with (e==null ? e2==null : e.equals(e2)). As long as you follow the contracts, HashSet works as documented, but avoids actually doing wasteful work whenever possible. As soon as you break the rules however, HashSet cannot be expected to behave in any useful way.

Consider also that if you attempted to store objects in a TreeSet with an incorrectly implemented Comparator, you would similarly see nonsensical results. I documented some examples of how a TreeSet behaves when using an untrustworthy Comparator in another question: how to implement a comparator for StringBuffer class in Java for use in TreeSet?

100

answered Sep 23 '22 23:09

dimo414

Related questions
                            
                                How to convert a char array to a string array
                            
                                Sorting an ArrayList of Objects by Last name and firstname in Java
                            
                                Class path entry is missing in Eclipse
                            
                                How can I use a custom class in a TreeSet?
                            
                                Android: DigitalClock remove seconds
                            
                                glassfish in eclipse doesn't seem to see my JDK
                            
                                Accessing the full url, including hostname with jstl
                            
                                Why do we say that a static method in Java is not a virtual method?
                            
                                How to install JPype on OS X Lion to use with Neo4j?
                            
                                Java Local Nested Classes and accessing super methods
                            
                                jUnit and Guava, comparing list equality after transform()
                            
                                Using HashMap in multithreaded environment
                            
                                How to write text on an image
                            
                                Why do I get Exception java.lang.NoClassDefFoundError org/codehaus/groovy/? [duplicate]
                            
                                Remove multiple linebreaks
                            
                                Can I null-check in-line?
                            
                                Accessing package-private fields in Java
                            
                                java BigDecimal subraction failing
                            
                                Use dynamic R strings in Android
                            
                                Checking if all characters of a string is uppercase except special symbols

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does HashSet allow equal items if hashcodes are different?

Tags:

java

hashcode

hashset

Thunderforge

People also ask

1 Answers

dimo414

Recent Activity

Donate For Us