Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to guarantee that equals() and hashCode() are in sync?

Tags:

We are writing a class which requires very complicated logic to compute equals() and hashCode(). Something along lines with:

@Getters @Setters @FieldDefaults(level=AccessLevel.PRIVATE)
public class ExternalData {
  TypeEnum type;
  String data;
  List<ExternalData> children;
} 

We do not construct these objects, they are deserialized from XML from external complex system. There are 20+ types and depending on type data can be ignored, or processed with children, or processed without children and comparison of data for each type of node depend on type.

We created equals() and hashCode() to reflect all those rules, but recently ran into an issue that hashCode got out of sync with equals causing equal objects to be added twice to a HashSet. I believe HashMap (and HashSet for that matter) are implemented this way in Java: https://en.wikipedia.org/wiki/Hash_table The implementation first put objects in buckets based on hashCode and then for each bucket checks equals. In unfortunate scenario where 2 equal object will go to different buckets they will never be compared by equals(). By "Out of sync" here I mean that they go into different buckets.

What is the best way to make sure equals and hashCode do not get out of sync?

Edit: This question is different from What issues should be considered when overriding equals and hashCode in Java? There they ask about generic guidance and the accepted answer does not apply in my situation. They say "make equals and hashCode consistent", here I'm asking how exactly do I do that.

like image 228
Artem Avatar asked Jul 03 '16 21:07

Artem


2 Answers

The Guava testlib library has a class called EqualsTester that can be used to write tests for your equals() and hashCode() implementations.

Adding tests both helps you ensure that the code is correct now, and also ensures that it stays correct if/when you modify it in the future.

like image 169
Daniel Pryden Avatar answered Dec 17 '22 01:12

Daniel Pryden


If the traversal algorithm is sufficiently complex that you want to avoid repeating yourself, isolate the algorithm into a method that both equals and hashCode can use.

I see two options, which (as is so often the case) trade-off between being broadly-applicable and being efficient.

Broadly-applicable

The first option is to write quite a general traversal method that accepts a functional interface and calls back to it at each stage of the traversal, so you can pass a lambda or instance into it containing the actual logic you want to perform while traversing; the Visitor pattern. That interface would want to have a way to say "stop traversing" (e.g., so equals can bail when it knows the answer is "not equal"). Conceptually, that would look something like:

private boolean traverse(Visitor visitor) {
    while (/*still traversing*/) {
        if (!visitor.visitNode(thisNode)) {
            return false;
        }
        /*determine next node to visit and whether done*/
    }
    return true;
}

Then equals and hashCode use that to implement equality checking or hash code building without having to know the traversal algorithm.

I've chosen above to have the method return a flag for whether traversal ended early, but that's a design detail. You might not return anything, or might return this for chaining, whatever's suitable in your situation.

The issue, though, is that using it means allocating an instance (or using a lambda but then you probably need to allocate something for the lamba to update anyway to keep track of what it's doing) and doing a lot of method calls. Maybe that's fine in your case; maybe it's a performance killer because your app needs to use equals a lot. :-)

Specific and efficient

...and so you might want to write something specific to this case, writing something that has the logic for equals and hashCode built into it. It would return the hash code when being used by hashCode, or a flag value for equals (0 = not equal, !0 = equal). No longer generally-useful, but it avoids creating a visitor instance to pass in / lambda overhead / call overhead. Conceptually, this might look something like:

private int equalsHashCodeWorker(Object other, boolean forEquals) {
    int code = 0;

    if (forEquals && other == null) {
        // not equal
    } else {
        while (/*still traversing*/) {
            /*update `code` depending on the results for this node*/
        }
    }

    return code;    
}

Again the specifics will be, um, specific to your case as well as your style guide and such. Some folks would make the other argument serve two purposes (both flag and the "other" object) by having equals handle the other == null case itself and only call this worker when it has a non-null object. I prefer to avoid doubling up on the meaning of arguments like that, but you see it often enough.

Testing

Whichever way you go, if you're in a shop with a testing culture, naturally you'll want to build tests for the complex cases you've already seen fail as well as other cases where you see failure opportunities.

Side note about hashCode

Regardless of the above, if you expect hashCode to be called a lot, you might consider caching the result to an instance field. If the object you're doing this with is mutable (and it sounds like it is), you'd invalidate that stored hashcode any time you mutate the object's state. That way, if the object hasn't changed, you don't have to repeat the traversal on subsequent calls to hashCode. But of course, if you forget to invalidate the hash code in even one of your mutator methods...

like image 31
T.J. Crowder Avatar answered Dec 17 '22 03:12

T.J. Crowder