Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calling Distinct<>() on HashSet<T>

Tags:

c#

.net

linq

I'm just curious.. When I call Distinct<>() (from Linq) on HashSet, does .NET know, that this IEnumerable always contains distinct set of values, and optimizes this call away?

like image 231
nothrow Avatar asked Dec 10 '10 11:12

nothrow


People also ask

What is distinct () in C#?

C# Linq Distinct() method removes the duplicate elements from a sequence (list) and returns the distinct elements from a single data source. It comes under the Set operators' category in LINQ query operators, and the method works the same way as the DISTINCT directive in Structured Query Language (SQL).

Does HashSet allow duplicates C#?

A HashSet<T> collection is not sorted and cannot contain duplicate elements.

How does a HashSet work C#?

HashSet(IEnumerable) Initializes a new instance of the HashSet class that uses the default equality comparer for the set type, contains elements copied from the specified collection, and has sufficient capacity to accommodate the number of elements copied.


3 Answers

Judging by looking at the code through Reflector, I would have to say no.

The code ends up construct an instance of an iterator method generated class regardless of what type you give it.

This problem is also compounded by the fact that you can specify comparer objects for both the Hashset and the Distinct method, which means the optimization would only be used in very few cases.

For instance, in the following case it could actually optimize the call away, but it wouldn't be able to know that:

var set = new HashSet<int>(new MyOwnInt32Comparer());
var distinct = set.Distinct(new MyOwnInt32Comparer());

Since I give it two instances of the comparer class, and such classes usually doesn't implement equality methods, the Distinct method would have no way of knowing that the two comparer implementations are actually identical.

In any case, this is a case where the programmer knows more about the code than the runtime, so take advantage of it. Linq may be very good but it's not omnipotent, so use your knowledge to your advantage.

like image 61
Lasse V. Karlsen Avatar answered Oct 17 '22 04:10

Lasse V. Karlsen


I think No, because the input of Enumerable class for distinct method is IEnumerable and there is nothing specific for determining it's a hash set (so do not do anything).

like image 2
Saeed Amiri Avatar answered Oct 17 '22 06:10

Saeed Amiri


No, looking at the implementation in reflector, it doesn't check if the enumeration is a HashSet<T>. The underlying iterator creates a new set and fills it during enumeration, so the overhead shouldn't be that large though.

like image 2
herzmeister Avatar answered Oct 17 '22 04:10

herzmeister