Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implement IEquatable for POCO

I noticed that EF's DbSet.Add() is quite slow. A little googling turned up a SO answer that promises up to 180x performance gains:

https://stackoverflow.com/a/7052504/141172

However, I do not understand exactly how to implement IEquatable<T> as suggested in the answer.

According to MSDN, if I implement IEquatable<T>, I should also override Equals() and GetHashCode().

As with many POCO's, my objects are mutable. Before being committed to the database (SaveChanges()), new objects have an Id of 0. After the objects have been saved, the Id serves as an ideal basis for implementing IEquatable, Equals() and GetHashCode().

It is unwise to include any mutable property in a hash code, and since according to MSDN

If two objects compare as equal, the GetHashCode method for each object must return the same value

Should I implement IEquatable<T> as a property-by-property comparison (e.g. this.FirstName == other.FirstName) and not override Equals() and GetHashCode()?

Given that my POCO's are used in an EntityFramework context, should any special attention be paid to the Id field?

like image 920
Eric J. Avatar asked Mar 20 '12 06:03

Eric J.


2 Answers

I came across your question in search for a solution to the same question. Here is a solution that I am trying out, see if it meets your needs:

First, all my POCOs derive from this abstract class:

public abstract class BasePOCO <T> : IEquatable<T> where T : class
{
    private readonly Guid _guid = Guid.NewGuid();

    #region IEquatable<T> Members

    public abstract bool Equals(T other);

    #endregion

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj))
        {
            return false;
        }
        if (ReferenceEquals(this, obj))
        {
            return true;
        }
        if (obj.GetType() != typeof (T))
        {
            return false;
        }
        return Equals((T)obj);
    }

    public override int GetHashCode()
    {
        return _guid.GetHashCode();
    }
}

I created a readonly Guid field that I am using in the GetHashCode() override. This will ensure that were I to put the derived POCO into a Dictionary or something else that uses the hash, I would not orphan it if I called a .SaveChanges() in the interim and the ID field was updated by the base class This is the one part I'm not sure is completely correct, or if it is any better than just Base.GetHashCode()?. I abstracted the Equals(T other) method to ensure the implementing classes had to implement it in some meaningful way, most likely with the ID field. I put the Equals(object obj) override in this base class because it would probably be the same for all the derived classes too.

This would be an implementation of the abstract class:

public class Species : BasePOCO<Species>
{
    public int ID { get; set; }
    public string LegacyCode { get; set; }
    public string Name { get; set; }

    public override bool Equals(Species other)
    {
        if (ReferenceEquals(null, other))
        {
            return false;
        }
        if (ReferenceEquals(this, other))
        {
            return true;
        }
        return ID != 0 && 
               ID == other.ID && 
               LegacyCode == other.LegacyCode &&
               Name == other.Name;
    }
}

The ID property is set as the primary key in the Database and EF knows that. ID is 0 on a newly created objects, then gets set to a unique positive integer on .SaveChanges(). So in the overridden Equals(Species other) method, null objects are obviously not equal, same references obviously are, then we only need to check if the ID == 0. If it is, we will say that two objects of the same type that both have IDs of 0 are not equal. Otherwise, we will say they are equal if their properties are all the same.

I think this covers all the relevant situations, but please chime in if I am incorrect. Hope this helps.

=== Edit 1

I was thinking my GetHashCode() wasn't right, and I looked at this https://stackoverflow.com/a/371348/213169 answer regarding the subject. The implementation above would violate the constraint that objects returning Equals() == true must have the same hashcode.

Here is my second stab at it:

public abstract class BasePOCO <T> : IEquatable<T> where T : class
{
    #region IEquatable<T> Members

    public abstract bool Equals(T other);

    #endregion

    public abstract override bool Equals(object obj);
    public abstract override int GetHashCode();
}

And the implementation:

public class Species : BasePOCO<Species>
{
    public int ID { get; set; }
    public string LegacyCode { get; set; }
    public string Name { get; set; }

    public override bool Equals(Species other)
    {
        if (ReferenceEquals(null, other))
        {
            return false;
        }
        if (ReferenceEquals(this, other))
        {
            return true;
        }
        return ID != 0 && 
        ID == other.ID && 
        LegacyCode == other.LegacyCode && 
        Name == other.Name;
    }

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj))
        {
            return false;
        }
        if (ReferenceEquals(this, obj))
        {
            return true;
        }
        return Equals(obj as Species);
    }

    public override int GetHashCode()
    {
        unchecked
        {
            return ((LegacyCode != null ? LegacyCode.GetHashCode() : 0) * 397) ^ 
                   (Name != null ? Name.GetHashCode() : 0);
        }
    }

    public static bool operator ==(Species left, Species right)
    {
        return Equals(left, right);
    }

    public static bool operator !=(Species left, Species right)
    {
        return !Equals(left, right);
    }
}

So I got rid of the Guid in the base class and moved GetHashCode to the implementation. I used Resharper's implementation of GetHashCode with all the properties except ID, since ID could change (don't want orphans). This will meet the constraint on equality in the linked answer above.

like image 61
Jon Comtois Avatar answered Nov 04 '22 12:11

Jon Comtois


As with many POCO's, my objects are mutable

But tehy should NOT be mutable on the fields that are the primary key. Per defintiion, or you are in a world of pain database wise anyway later.

Generate the HashCode ONLY on the fields of the primay key.

Equals() must return true IFF the participating objects have the same hash code

BZZZ - Error.

Hashcodes are double. It is possible for 2 objects to have different values and the smae hashcode. A hsahsode is an int (32bit). A string can be 2gb long. You can not mapp every possible string to a separate hashcode.

IF two objects have the same hashcode, they may be diferent. If two objects are the same, they can NOT have different hashcodes.

Where do you get the idea that Equals must return true for objects with the same hashcode?

Also, PCO or not, an object mapped to a database and used in a relation MUST have a stable primary key (which can be used to run the hashcode calculation). An object not having this STIL lshould have primary key (per SQL Server requirements), using a sequence / artificial primary key works here. Again, use that to run the HashCode calculation.

like image 1
TomTom Avatar answered Nov 04 '22 10:11

TomTom