Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Checking for duplicates in a List of Objects C#

Tags:

linq

I am looking for a really fast way to check for duplicates in a list of objects.

I was thinking of simply looping through the list and doing a manual comparison that way, but I thought that linq might provide a more elegant solution...

Suppose I have an object...

public class dupeCheckee {      public string checkThis { get; set; }      public string checkThat { get; set; }       dupeCheckee(string val, string val2)      {          checkThis = val;          checkThat = val2;      } } 

And I have a list of those objects

List<dupeCheckee> dupList = new List<dupeCheckee>(); dupList.Add(new dupeCheckee("test1", "value1")); dupList.Add(new dupeCheckee("test2", "value1")); dupList.Add(new dupeCheckee("test3", "value1")); dupList.Add(new dupeCheckee("test1", "value1"));//dupe dupList.Add(new dupeCheckee("test2", "value1"));//dupe...  dupList.Add(new dupeCheckee("test4", "value1")); dupList.Add(new dupeCheckee("test5", "value1")); dupList.Add(new dupeCheckee("test1", "value2"));//not dupe 

I need to find the dupes in that list. When I find it, I need to do some additional logic not necessarily removing them.

When I use linq some how my GroupBy is throwing an exception...

'System.Collections.Generic.List<dupeCheckee>' does not contain a definition for 'GroupBy' and no extension method 'GroupBy' accepting a first argument of type 'System.Collections.Generic.List<dupeCheckee>' could be found (are you missing a using directive or an assembly reference?) 

Which is telling me that I am missing a library. I am having a hard time figuring out which one though.

Once I figure that out though, How would I essentially check for those two conditions... IE checkThis and checkThat both occur more than once?

UPDATE: What I came up with

This is the linq query that I came up with after doing quick research...

test.Count != test.Select(c => new { c.checkThat, c.checkThis }).Distinct().Count() 

I am not certain if this is definitely better than this answer...

var duplicates = test.GroupBy(x => new {x.checkThis, x.checkThat})                    .Where(x => x.Skip(1).Any()); 

I know I can put the first statement into an if else clause. I also ran a quick test. The duplicates list gives me back 1 when I was expecting 0 but it did correctly call the fact that I had duplicates in one of the sets that I used...

The other methodology does exactly as I expect it to. Here are the data sets that I use to test this out....

Dupes:

List<DupeCheckee> test = new List<DupeCheckee>{       new DupeCheckee("test0", "test1"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test1", "test2"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test2", "test3"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test0", "test5"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test1", "test6"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test2", "test7"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test3", "test8"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test0", "test5"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test1", "test1"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test2", "test2"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test4", "test4"),//{ checkThis = "test", checkThat = "test1"}  }; 

No dupes...

     List<DupeCheckee> test2 = new List<DupeCheckee>{       new DupeCheckee("test0", "test1"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test1", "test2"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test2", "test3"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test4", "test5"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test5", "test6"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test6", "test7"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test7", "test8"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test8", "test5"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test9", "test1"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test2", "test2"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}      new DupeCheckee("test4", "test4"),//{ checkThis = "test", checkThat = "test1"}  }; 
like image 756
SoftwareSavant Avatar asked Apr 24 '13 16:04

SoftwareSavant


People also ask

Can list have duplicates in C#?

Place all items in a set and if the count of the set is different from the count of the list then there is a duplicate. Should be more efficient than Distinct as there is no need to go through all the list. Don't call list. Count() method.

How do you find duplicates in Arraylist?

One of the most common ways to find duplicates is by using the brute force method, which compares each element of the array to every other element. This solution has the time complexity of O(n^2) and only exists for academic purposes.


2 Answers

You need to reference System.Linq (e.g. using System.Linq)

then you can do

var dupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat})                    .Where(x => x.Skip(1).Any()); 

This will give you groups with all the duplicates

The test for duplicates would then be

var hasDupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat})                    .Where(x => x.Skip(1).Any()).Any(); 

or even call ToList() or ToArray() to force the calculation of the result and then you can both check for dupes and examine them.

eg..

var dupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat})                    .Where(x => x.Skip(1).Any()).ToArray(); if (dupes.Any()) {   foreach (var dupeList in dupes) {     Console.WriteLine(string.Format("checkThis={0},checkThat={1} has {2} duplicates",                       duplist.Key.checkThis,                        duplist.Key.checkThat,                       duplist.Count() - 1));   }  } 

Alternatively

var dupes = dupList.Select((x, i) => new { index = i, value = x})                    .GroupBy(x => new {x.value.checkThis, x.value.checkThat})                    .Where(x => x.Skip(1).Any()); 

Which give you the groups which each item per group stores the original index in a property index and the item in the property value

like image 167
Bob Vale Avatar answered Sep 28 '22 07:09

Bob Vale


There was huge amount of working solutions, but I think that next solution will be more transparent and easy to understand, then all above:

var hasDuplicatedEntries = ListWithPossibleDuplicates                                    .GroupBy(YourGroupingExpression)                                    .Any(e => e.Count() > 1); if(hasDuplicatedEntries) {    // Do what ever you want in case when list contains duplicates  } 
like image 45
Maris Avatar answered Sep 28 '22 08:09

Maris