Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting distinct and ordered members from a list of strings - linq or hashset for unique which one is faster / better suited

I have a big list of strings (about 5k-20k entries) that I need to order and also to remove duplicates from.

I've done this in 2 ways now, once with a hashset and once solely with linq. Tests with that number of entries did not show a big difference but I'm wondering what way and thus what method would be better suited.

For the ways (myList is of the datatype List):

Linq: I'm using 1 linq statement to order the list and get the distinct values from it.

myList = myList.OrderBy(q => q).Distinct().ToList();

Hashset: I'm using hashset to remove all duplicates and then I'm ordering the list

myList = new HashSet<String>(myList).ToList<String>();
myList = myList.OrderBy(q => q).ToList();

Like I said tests I made were about the same time consumption for both methods but I'm still wondering if one method is better than the other and if so why (the code is for a high performance part and I need to get every millisecond I can out of it).

like image 772
Thomas Avatar asked Aug 21 '14 08:08

Thomas


People also ask

What is distinct in Linq?

C# Linq Distinct() method removes the duplicate elements from a sequence (list) and returns the distinct elements from a single data source. It comes under the Set operators' category in LINQ query operators, and the method works the same way as the DISTINCT directive in Structured Query Language (SQL).

Is Hashset distinct?

Note: Hashset is a collection of distinct values.


1 Answers

If you're really concerned about every nanosecond, then

myList = myList.Distinct().OrderBy(q => q).ToList();

might be slightly faster than:

myList = myList.OrderBy(q => q).Distinct().ToList();

if there are a large number of duplicates.

The LINQ method is more readable and will have similar performance to explicitly creating a HashSet<T> as others have said. In fact it may be slightly faster if the original List is already sorted, since the LINQ method will preserve the initial order before sorting, while explicitly creating a HashSet<T> will enumerate in an undefined order.

like image 72
Joe Avatar answered Oct 13 '22 01:10

Joe