Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I prove that Object.hashCode() can produce same hash code for two different objects in Java?

Had a discussion with an interviewer regarding internal implementation of Java Hashmaps and how it would behave if we override equals() but not the HashCode() method for an Employee<Emp_ID, Emp_Name> object.

I was told that hashCode for two different objects would never be the same for the default object.hashCode() implementation, unless we overrode the hashCode() ourselves.

From what I remembered, I told him that Java Hashcode contracts says that two different objects "may" have the same hashcode() not that it "must".

According to my interviewer, the default object.hashcode() never returns the same hashcode() for two different objects, Is this true?

Is it even remotely possible to write a code that demonstrates this. From what I understand, Object.hashcode() can produce 2^30 unique values, how does one produce a collision, with such low possibility of collision to demonstrate that two different objects can get the same hashcode() with the Object classes method.

Or is he right, with the default Object.HashCode() implementation, we will never have a collision i.e two different objects can never have the same HashCode. If so, why do so many java manuals don't explicitly say so.

How can I write some code to demonstrate this? Because on demonstrating this, I can also prove that a bucket in a hashmap can contain different HashCodes(I tried to show him the debugger where the hashMap was expanded but he told me that this is just logical Implementation and not the internal algo?)

like image 999
Sameer Avatar asked Dec 02 '16 11:12

Sameer


People also ask

Can 2 unequal objects have the same hashCode if its function is implemented correctly?

If two objects are not equal then they cannot have the same hashcode.

How you write hashCode What if it returns different hashCode for same object?

Whenever it(hashcode) is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified.

Can two different strings have same hashCode?

If two string objects are equal, the GetHashCode method returns identical values. However, there is not a unique hash code value for each unique string value. Different strings can return the same hash code.

How will you retrieve value object if two keys will have the same hashCode?

If two key have the same hashCode (which doesn't mean they are identical), they will be stored in the same linked list inside the HashMap (assuming you are asking about HashMap s), and the value to be returned will be determined be comparing all the keys that have the same hashCode with the requested key (using equals ...


2 Answers

2^30 unique values sounds like a lot but the birthday problem means we don't need many objects to get a collision.

The following program works for me in about a second and gives a collision between objects 196 and 121949. I suspect it will heavily depend on your system configuration, compiler version etc.

As you can see from the implementation of the Hashable class, every one is guarenteed to be unique and yet there are still collisions.

class HashCollider
{
    static class Hashable
    {
        private static int curr_id = 0;
        public  final  int id;

        Hashable()
        {
            id = curr_id++;
        }
    }

    public static void main(String[] args)
    {
        final int NUM_OBJS = 200000; // birthday problem suggests
                                     // this will be plenty

        Hashable objs[] = new Hashable[NUM_OBJS];  
        for (int i = 0; i < NUM_OBJS; ++i) objs[i] = new Hashable();

        for (int i = 0; i < NUM_OBJS; ++i)
        {
            for (int j = i + 1; j < NUM_OBJS; ++j)
            {
                if (objs[i].hashCode() == objs[j].hashCode())
                {
                    System.out.println("Objects with IDs " + objs[i].id
                                     + " and " + objs[j].id + " collided.");
                    System.exit(0);
                }
            }
        }

        System.out.println("No collision");
    }
}
like image 165
Michael Avatar answered Sep 22 '22 04:09

Michael


If you have a large enough heap (assuming 64 bit address space) and objects are small enough (the smallest object size on a 64 bit JVM is 8 bytes), then you will be able to represent more than 2^32 objects that are reachable at the same time. At that point, the objects' identity hashcodes cannot be unique.

However, you don't need a monstrous heap. If you create a large enough pool of objects (e.g. in a large array) and randomly delete and recreate them, it is (I think) guaranteed that you will get a hashcode collision ... if you continue doing this long enough.

  • The default algorithm for hashcode in older versions of Java is based on the address of the object when hashcode is first called. If the garbage collector moves an object, and another one is created at the original address of the first one, and identityHashCode is called, then the two objects will have the same identity hashcode.

  • The current (Java 8) default algorithm uses a PRNG. The "birthday paradox" formula will tell you the probability that one object's identity hashcode is the same as one more of the other's.


The -XXhashCode=n option that @BastianJ mentioned has the following behavior:

  • hashCode == 0: Returns a freshly generated pseudo-random number

  • hashCode == 1: XORs the object address with a pseudo-random number that changes occasionally.

  • hashCode == 2: The hashCode is 1! (Hence @BastianJ's "cheat" answer.)

  • hashCode == 3: The hashcode is an ascending sequence number.

  • hashCode == 4: the bottom 32 bits of the object address

  • hashCode >= 5: This is the default algorithm for Java 8. It uses Marsaglia's xor-shift PRNG with a thread specific seed.

If you have downloaded the OpenJDK Java 8 source code, you will find the implementation in hotspot/src/share/vm/runtime/synchronizer.cp. Look for the get_next_hash() method.


So that is another way to prove it. Show him the source code!

like image 44
Stephen C Avatar answered Sep 23 '22 04:09

Stephen C