Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Uniform distribution of hashcode()

I define my class as:

final class Key<T extends Comparable<T>> {
    private final T q;
    private final T o;
    public Key(T q1, T o1) {
        q = q1;
        o = o1;
    }

    @Override
    public boolean equals(Object obj) {
        if(obj != null && obj instanceof Key) {
            Key<T> s = (Key<T>)obj;
            return q.equals(s.q) && o.equals(s.o);
        }
        return false;
    }

    @Override
    public int hashCode() {
        return Objects.hash(q,o);
    }
}

I also define an array to contain object key . For example:

Object arr[] = new Object[100];
Key<String> k = new Key<>("a","b");
int h = k.hashcode();
...
arr[h+i % h] = k; //i from 1 to 10 for example

The problem is that hashcode() can return a negative value so

arr[h+i % h] = k;

can return an error out of array index. That's why I changed my code as(based on my searching for avoiding hashcode() return negative value):

@Override
        public int hashCode() {
            return (Objects.hash(q,o)&0x7FFFFFFF);
        }

So if I do this way, does a uniform distribution of the hashcode() be changed or not? I mean the probability to have a same value from two different objects will be increased or not?

like image 482
nd07 Avatar asked Apr 15 '16 08:04

nd07


People also ask

What are the functions of hashCode () method?

The Java hashCode() Method It returns an integer whose value represents the hash value of the input object. The hashCode() method is used to generate the hash values of objects. Using these hash values, these objects are stored in Java collections such as HashMap, HashSet and HashTable.

What is the best strategy to calculate hashCode?

The easiest way to compute a field's hash code is to just call `hashCode` on it. Combining them could be done manually.

Can we use random numbers in the hashCode () method?

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result. So no, making it random is a bad idea.

Can hashCode of two strings be same in Java?

If two string objects are equal, the GetHashCode method returns identical values. However, there is not a unique hash code value for each unique string value. Different strings can return the same hash code.


2 Answers

Please have look in to Murmurhash and MurmurHash - what is it? Fortunately Google guava has ready made implementation for this.

Guava way is like below example We have following classes

import com.google.common.hash.HashCode; import com.google.common.hash.HashFunction; import com.google.common.hash.Hashing;

using the above classes I have my method to generate hashcode like below

/**
     * getMurmur128Hash.
     * 
     * @param content
     * @return HashCode
     */
    public static HashCode getMurmur128Hash(String content) {
        final HashFunction hf = Hashing.murmur3_128();
        final HashCode hc = hf.newHasher().putString(content, Charsets.UTF_8).hash();
        return hc;
    }
    /**
     * getAbsMurmur128HashAsLongVal.
     * 
     * @param content
     * @return Long Absolute value of Long for the HashCode.
     */
    public static Long getAbsMurmur128HashAsLongVal(String content) {
        return Math.abs(getMurmur128Hash(content).asLong());
    }
like image 91
Ram Ghadiyaram Avatar answered Sep 19 '22 17:09

Ram Ghadiyaram


The Object.hash() has a very simple hashCode which is not particularly uniform for simple examples. e.g. Objects.hash("B", "B") and Objects.hash("A", "a") have the same hashCode. (And BTW simple enough that I could work that out in my head)

Also every between Objects.hashCode("a", "a") and Objects.hashCode("z", "z") is between 4065 and 4865 which doesn't look particularly uniform, esp for the higher bits.

Within this context, I think you can say you are not making matters any worse.

like image 22
Peter Lawrey Avatar answered Sep 20 '22 17:09

Peter Lawrey