Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

is it incorrect to define an hashcode of an object as the sum, multiplication, whatever, of all class variables hashcodes?

Let's say I have the following class:

class ABC {
    private int myInt = 1;
    private double myDouble = 2;
    private String myString = "123";
    private SomeRandomClass1 myRandomClass1 = new ...
    private SomeRandomClass2 myRandomClass2 = new ...

    //pseudo code
    public int myHashCode() {
        return 37 *
               myInt.hashcode() *
               myDouble.hashCode() *
               ... *
               myRandomClass.hashcode()
    }
}

Would this be a correct implementation of hashCode? This is not how I usually do it(I tend to follow Effective Java's guide-lines) but I always have the temptation to just do something like the above code.

Thanks

like image 515
devoured elysium Avatar asked Apr 29 '10 05:04

devoured elysium


2 Answers

I'm probably much too late to this party, but.....

The problem with your hash function can be described by the commutative law of multiplication. - khan academy link

That is - a * b * c == c * b * a.

So by only multiplying you remove the uniqueness of the order. This means you are much more likely to get collisions.

like image 137
Neil Mason Avatar answered Sep 19 '22 16:09

Neil Mason


It depends what you mean by "correct". Assuming that you're using the hashCode() of all the relevant equals()-defining fields, then yes, it's "correct". However, such formulas probably will not have a good distribution, and therefore would likely cause more collisions than otherwise, which will have a detrimental effect on performance.

Here's a quote from Effective Java 2nd Edition, Item 9: Always override hashCode when you override equals

While the recipe in this item yields reasonably good hash functions, it does not yield state-of-the-art hash functions, nor do Java platform libraries provide such hash functions as of release 1.6. Writing such hash functions is a research topic, best left to mathematicians and computer scientists. [...Nonetheless,] the techniques described in this item should be adequate for most applications.

It may not require a lot of mathematical power to evaluate how good your proposed hash function is, but why even bother? Why not just follow something that has been anecdotally proven to be adequate in practice?

Josh Bloch's recipe

  • Store some constant nonzero value, say 17, in an int variable called result.
  • Compute an int hashcode c for each field:
    • If the field is a boolean, compute (f ? 1 : 0)
    • If the field is a byte, char, short, int, compute (int) f
    • If the field is a long, compute (int) (f ^ (f >>> 32))
    • If the field is a float, compute Float.floatToIntBits(f)
    • If the field is a double, compute Double.doubleToLongBits(f), then hash the resulting long as in above.
    • If the field is an object reference and this class's equals method compares the field by recursively invoking equals, recursively invoke hashCode on the field. If the value of the field is null, return 0.
    • If the field is an array, treat it as if each element is a separate field. If every element in an array field is significant, you can use one of the Arrays.hashCode methods added in release 1.5.
  • Combine the hashcode c into result as follows: result = 31 * result + c;

Now, of course that recipe is rather complicated, but luckily, you don't have to reimplement it every time, thanks to java.util.Arrays.hashCode(Object[]) (and com.google.common.base.Objects provides a convenient vararg variant).

@Override public int hashCode() {
    return Arrays.hashCode(new Object[] {
           myInt,    //auto-boxed
           myDouble, //auto-boxed
           myRandomClass,
    });
}

See also

  • Object.hashCode()

    It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.

like image 23
polygenelubricants Avatar answered Sep 20 '22 16:09

polygenelubricants