Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement IEqualityComparer to return distinct values?

I have a L2E query that returns some data that contains duplicate objects. I need to remove those duplicate objects. Basically I should assume that if their IDs are the same then the objects are duplicate. I've tried q.Distinct(), but that still returned duplicate objects. Then I've tried implementing my own IEqualityComparer and passing it to the Distinct() method. The method failed with following text:

LINQ to Entities does not recognize the method 'System.Linq.IQueryable1[DAL.MyDOClass] Distinct[MyDOClass](System.Linq.IQueryable1[DAL.MyDOClass], System.Collections.Generic.IEqualityComparer`1[DAL.MyDOClass])' method, and this method cannot be translated into a store expression.

And here is the implementation of EqualityComparer:

  internal class MyDOClassComparer: EqualityComparer<MyDOClass>     {         public override bool Equals(MyDOClass x, MyDOClass y)         {             return x.Id == y.Id;         }          public override int GetHashCode(MyDOClass obj)         {             return obj == null ? 0 : obj.Id;         }     } 

So how do I write my own IEqualityComparer properly?

like image 783
Bogdan Verbenets Avatar asked Dec 19 '11 11:12

Bogdan Verbenets


2 Answers

An EqualityComparer is not the way to go - it can only filter your result set in memory eg:

var objects = yourResults.ToEnumerable().Distinct(yourEqualityComparer); 

You can use the GroupBy method to group by IDs and the First method to let your database only retrieve a unique entry per ID eg:

var objects = yourResults.GroupBy(o => o.Id).Select(g => g.First()); 
like image 94
Rich O'Kelly Avatar answered Sep 24 '22 23:09

Rich O'Kelly


rich.okelly and Ladislav Mrnka are both correct in different ways.

Both their answers deal with the fact that the IEqualityComparer<T>'s methods won't be translated to SQL.

I think it's worth looking at the pros and cons of each, which will take a bit more than a comment.

rich's approach re-writes the query to a different query with the same ultimate result. Their code should result in more or less how you would efficiently do this with hand-coded SQL.

Ladislav's pulls it out of the database at the point before the distinct, and then an in-memory approach will work.

Since the database is great at doing the sort of grouping and filtering rich's depends upon, it will likely be the most performant in this case. You could though find that the complexity of what's going on prior to this grouping is such that Linq-to-entities doesn't nicely generate a single query but rather produces a bunch of queries and then does some of the work in-memory, which could be pretty nasty.

Generally grouping is more expensive than distinct on in-memory cases (especially if you bring it into memory with AsList() rather than AsEnumerable()). So if either you were already going to bring it into memory at this stage due to some other requirement, it would be more performant.

It would also be the only choice if your equality definition was something that didn't relate well to what is available just in the database, and of course it allows you to switch equality definitions if you wanted to do so based on an IEqualityComparer<T> passed as a parameter.

In all, rich's is the answer I'd say would be most-likely to be the best choice here, but the different pros and cons to Ladislav's compared to rich's makes it also well worth studying and considering.

like image 23
Jon Hanna Avatar answered Sep 23 '22 23:09

Jon Hanna