Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing GetHashCode for IEqualityComparer<T> with conditional equality

I'm wondering if anyone as any suggestions for this problem.

I'm using intersect and except (Linq) with a custom IEqualityComparer in order to query the set differences and set intersections of two sequences of ISyncableUsers.

public interface ISyncableUser
{
    string Guid { get; }
    string UserPrincipalName { get; }
}

The logic behind whether two ISyncableUsers are equal is conditional. The conditions center around whether either of the two properties, Guid and UserPrincipalName, have values. The best way to explain this logic is with code. Below is my implementation of the Equals method of my customer IEqualityComparer.

public bool Equals(ISyncableUser userA, ISyncableUser userB)
{
    if (userA == null && userB == null)
    {
        return true;
    }

    if (userA == null)
    {
        return false;
    }

    if (userB == null)
    {
        return false;
    }

    if ((!string.IsNullOrWhiteSpace(userA.Guid) && !string.IsNullOrWhiteSpace(userB.Guid)) &&
        userA.Guid == userB.Guid)
    {
        return true;
    }

    if (UsersHaveUpn(userA, userB))
    {
        if (userB.UserPrincipalName.Equals(userA.UserPrincipalName, StringComparison.InvariantCultureIgnoreCase))
        {
            return true;
        }
    }
    return false;
}

private bool UsersHaveUpn(ISyncableUser userA, ISyncableUser userB)
{
    return !string.IsNullOrWhiteSpace(userA.UserPrincipalName)
            && !string.IsNullOrWhiteSpace(userB.UserPrincipalName);
}

The problem I'm having, is with implementing GetHashCode so that the above conditional equality, represented above, is respected. The only way I've been able to get the intersect and except calls to work as expected is to simple always return the same value from GetHashCode(), forcing a call to Equals.

 public int GetHashCode(ISyncableUser obj)
 {
     return 0;
 }

This works but the performance penalty is huge, as expected. (I've tested this with non-conditional equality. With two sets containing 50000 objects, a proper hashcode implementation allows execution of intercept and except in about 40ms. A hashcode implementation that always returns 0 takes approximately 144000ms (yes, 2.4 minutes!))

So, how would I go about implementing a GetHashCode() in the scenario above?

Any thoughts would be more than welcome!

like image 795
Sam Shiles Avatar asked Oct 07 '22 03:10

Sam Shiles


1 Answers

If I'm reading this correctly, your equality relation is not transitive. Picture the following three ISyncableUsers:

A { Guid: "1", UserPrincipalName: "2" }
B { Guid: "2", UserPrincipalName: "2" }
C { Guid: "2", UserPrincipalName: "1" }
  • A == B because they have the same UserPrincipalName
  • B == C because they have the same Guid
  • A != C because they don't share either.

From the spec,

The Equals method is reflexive, symmetric, and transitive. That is, it returns true if used to compare an object with itself; true for two objects x and y if it is true for y and x; and true for two objects x and z if it is true for x and y and also true for y and z.

If your equality relation isn't consistent, there's no way you can implement a hash code that backs it up.

From another point of view: you're essentially looking for three functions:

  • G mapping GUIDs to ints (if you know the GUID but the UPN is blank)
  • U mapping UPNs to ints (if you know the UPN but the GUID is blank)
  • P mapping (guid, upn) pairs to ints (if you know both)

such that G(g) == U(u) == P(g, u) for all g and u. This is only possible if you ignore g and u completely.

like image 149
Rawling Avatar answered Oct 11 '22 02:10

Rawling