Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of HashSet<T> and Linq queries

During last week I received some code and was asked to improve the performance. So started with the job, but soon I saw that they use a lot of HashSet<T> objects, to store big collections of objects (between 10000 to more than 100000 objects). In the code they use HashSet<T> for performance reasons.

The only thing that they do is populate the HashSet with objects and then us some Linq to execute queries between multiple collections. Most of the queries are joining 1 or n HashSet, or retrieving specific object(s) from the collection with First() or Where().

I'm wondering if we gain any performance advantage compared to a normal List<T>? Because all the Linq extension methods they use in code are written for IEnumerable<T>.

On the internet a lot of articles say that List would be faster, but some say that HashSet handles huge collections much better than List.

Hope that someone can give me more advice.

Thanks.

like image 994
Chouffie Avatar asked Dec 21 '22 06:12

Chouffie


1 Answers

If you use just LINQ queries, you don't get any perf advantage, since you are just enumerating through the entire collection. In fact, it could be that List<T> is better performance because of the contiguous internal storage.

To get the perf benefit of a HashSet<T>, you need to use the ISet<T> methods, ideally with another HashSet<T> since, looking at the code, it is optimized for this case. Further, operations will only be faster which take advantage of the member objects' hash codes, like equality testing, since the performance of HashSet<T> is based on the O(1) performance characteristic of hash lookups. Operations which don't make use of the members' hash codes, like filtering on a member property vs. the members themselves, will need to be an O(N) operation, making it the same as a List<T>.

like image 104
codekaizen Avatar answered Jan 07 '23 22:01

codekaizen