Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Checking a list with null values for duplicates in C#

Tags:

c#

list

linq

In C#, I can use something like:

List<string> myList = new List<string>();  if (myList.Count != myList.Distinct().Count()) {     // there are duplicates } 

to check for duplicate elements in a list. However, when there are null items in list this produces a false positive. I can do this using some sluggish code but is there a way to check for duplicates in a list while disregarding null values with a concise way ?

like image 393
Cemre Mengü Avatar asked Jun 06 '13 11:06

Cemre Mengü


2 Answers

If you're worried about performance, the following code will stop as soon as it finds the first duplicate item - all the other solutions so far require the whole input to be iterated at least once.

var hashset = new HashSet<string>(); if (myList.Where(s => s != null).Any(s => !hashset.Add(s))) {     // there are duplicates } 

hashset.Add returns false if the item already exists in the set, and Any returns true as soon as the first true value occurs, so this will only search the input as far as the first duplicate.

like image 89
Rawling Avatar answered Sep 21 '22 01:09

Rawling


I'd do this differently:

Given Linq statements will be evaluated lazily, the .Any will short-circuit - meaning you don't have to iterate & count the entire list, if there are duplicates - and as such, should be more efficient.

var dupes = myList     .Where(item => item != null)     .GroupBy(item => item)     .Any(g => g.Count() > 1);  if(dupes) {     //there are duplicates } 

EDIT: http://pastebin.com/b9reVaJu Some Linqpad benchmarking that seems to conclude GroupBy with Count() is faster

EDIT 2: Rawling's answer below seems at least 5x faster than this approach!

like image 32
Dave Bish Avatar answered Sep 21 '22 01:09

Dave Bish