Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Doing Distinct() using base class IEqualityComparer, and still return the child class type?

I have a number of classes that derive from a class BaseClass where BaseClass just has an `Id property.

I now need to do distinct on a collections of some of these objects. I have the following code over and over for each of the child classes:

public class PositionComparer : IEqualityComparer<Position>
{
    public bool Equals(Position x, Position y)
    {
        return (x.Id == y.Id);
    }

    public int GetHashCode(Position obj)
    {
        return obj.Id.GetHashCode();
    }
}

Given the logic is just based on Id, I wanted to created a single comparer to reduce duplication:

public class BaseClassComparer : IEqualityComparer<BaseClass>
{
    public bool Equals(BaseClass x, BaseClass y)
    {
        return (x.Id == y.Id);
    }

    public int GetHashCode(BaseClass obj)
    {
        return obj.Id.GetHashCode();
    }
}

But this doesn't seem to compile:

  IEnumerable<Position> positions = GetAllPositions();
  positions = allPositions.Distinct(new BaseClassComparer())

...as it says it can't convert from BaseClass to Position. Why does the comparer force the return value of this Distinct() call?

like image 918
leora Avatar asked Apr 06 '13 13:04

leora


3 Answers

UPDATE: This question was the subject of my blog in July 2013. Thanks for the great question!


You have discovered an unfortunate edge case in the generic method type inference algorithm. We have:

Distinct<X>(IEnumerable<X>, IEqualityComparer<X>)

where the interfaces are:

IEnumerable<out T> -- covariant

and

IEqualityComparer<in T> -- contravariant

When we make the inference from allPositions to IEnumerable<X> we say "IEnumerable<T> is covariant in T, so we can accept Position or any larger type. (A base type is "larger" than a derived type; there are more animals than giraffes in the world.)

When we make the inference from the comparer we say "IEqualityComparer<T> is contravariant in T, so we can accept BaseClass or any smaller type."

So what happens when it comes time to actually deduce the type argument? We have two candidates: Position and BaseClass. Both satisfy the stated bounds. Position satisfies the first bound because it is identical to the first bound, and satisfies the second bound because it is smaller than the second bound. BaseClass satisfies the first bound because it is larger than the first bound, and identical to the second bound.

We have two winners. We need a tie breaker. What do we do in this situation?

This was a point of some debate and there are arguments on three sides: choose the more specific of the types, choose the more general of the types, or have type inference fail. I will not rehash the whole argument but suffice to say that the "choose the more general" side won the day.

(Making matters worse, there is a typo in the spec that says that "choose the more specific" is the right thing to do! This was the result of an editing error during the design process that has never been corrected. The compiler implements "choose the more general". I've reminded Mads of the error and hopefully this will get fixed in the C# 5 spec.)

So there you go. In this situation, type inference chooses the more general type and infers that the call means Distinct<BaseClass>. Type inference never takes the return type into account, and it certainly does not take what the expression is being assigned to into account, so the fact that it chooses a type that is incompatible with the assigned-to variable is not it's business.

My advice is to explicitly state the type argument in this case.

like image 80
Eric Lippert Avatar answered Nov 17 '22 12:11

Eric Lippert


If you look at the definition of Distinct there is only one generic type parameter involved (and not one TCollection used for input and output collections and one TComparison for the comparer). That means that your BaseClassComparer constrains the result type to base class and the conversion at the assignment is not possible.

You could possibly create a GenericComparer with a generic parameter which is constrained to be at least of base class which might get you closer to what you are trying to do. This would look like

public class GenericComparer<T> : IEqualityComparer<T> where T : BaseClass
{
    public bool Equals(T x, T y)
    {
        return x.Id == y.Id;
    }

    public int GetHashCode(T obj)
    {
        return obj.Id.GetHashCode();
    }
}

Because you need an instance and not just a method call you can't let the generic type be inferred by the compiler (see this discussion) but have to do so when creating the instance:

IEnumerable<Position> positions;
positions = allPositions.Distinct(new GenericComparer<Position>());

Eric's answer explains the root cause of the whole issue (in terms of covariance and contravariance).

like image 31
Simon Opelt Avatar answered Nov 17 '22 10:11

Simon Opelt


Imagine if you had:

var positions = allPositions.Distinct(new BaseClassComparer());

What would you expect the type of positions to be? As compiler deduces from argument given to Distinct which implements IEqualityComparer<BaseClass>, the type of the expression is IEnumerable<BaseClass>.

That type can't be automatically converted to IEnumerable<Position> so compiler produces an error.

like image 1
Zdeslav Vojkovic Avatar answered Nov 17 '22 10:11

Zdeslav Vojkovic