In C#, I can use something like:
List<string> myList = new List<string>(); if (myList.Count != myList.Distinct().Count()) { // there are duplicates }
to check for duplicate elements in a list. However, when there are null
items in list this produces a false positive. I can do this using some sluggish code but is there a way to check for duplicates in a list while disregarding null values with a concise way ?
If you're worried about performance, the following code will stop as soon as it finds the first duplicate item - all the other solutions so far require the whole input to be iterated at least once.
var hashset = new HashSet<string>(); if (myList.Where(s => s != null).Any(s => !hashset.Add(s))) { // there are duplicates }
hashset.Add
returns false
if the item already exists in the set, and Any
returns true
as soon as the first true
value occurs, so this will only search the input as far as the first duplicate.
I'd do this differently:
Given Linq statements will be evaluated lazily, the .Any
will short-circuit - meaning you don't have to iterate & count the entire list, if there are duplicates - and as such, should be more efficient.
var dupes = myList .Where(item => item != null) .GroupBy(item => item) .Any(g => g.Count() > 1); if(dupes) { //there are duplicates }
EDIT: http://pastebin.com/b9reVaJu Some Linqpad benchmarking that seems to conclude GroupBy
with Count()
is faster
EDIT 2: Rawling's answer below seems at least 5x faster than this approach!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With