I'm just curious.. When I call Distinct<>() (from Linq) on HashSet, does .NET know, that this IEnumerable always contains distinct set of values, and optimizes this call away?
C# Linq Distinct() method removes the duplicate elements from a sequence (list) and returns the distinct elements from a single data source. It comes under the Set operators' category in LINQ query operators, and the method works the same way as the DISTINCT directive in Structured Query Language (SQL).
A HashSet<T> collection is not sorted and cannot contain duplicate elements.
HashSet(IEnumerable) Initializes a new instance of the HashSet class that uses the default equality comparer for the set type, contains elements copied from the specified collection, and has sufficient capacity to accommodate the number of elements copied.
Judging by looking at the code through Reflector, I would have to say no.
The code ends up construct an instance of an iterator method generated class regardless of what type you give it.
This problem is also compounded by the fact that you can specify comparer objects for both the Hashset and the Distinct method, which means the optimization would only be used in very few cases.
For instance, in the following case it could actually optimize the call away, but it wouldn't be able to know that:
var set = new HashSet<int>(new MyOwnInt32Comparer());
var distinct = set.Distinct(new MyOwnInt32Comparer());
Since I give it two instances of the comparer class, and such classes usually doesn't implement equality methods, the Distinct method would have no way of knowing that the two comparer implementations are actually identical.
In any case, this is a case where the programmer knows more about the code than the runtime, so take advantage of it. Linq may be very good but it's not omnipotent, so use your knowledge to your advantage.
I think No, because the input of Enumerable class for distinct method is IEnumerable and there is nothing specific for determining it's a hash set (so do not do anything).
No, looking at the implementation in reflector, it doesn't check if the enumeration is a HashSet<T>. The underlying iterator creates a new set and fills it during enumeration, so the overhead shouldn't be that large though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With