I am looking for a really fast way to check for duplicates in a list of objects.
I was thinking of simply looping through the list and doing a manual comparison that way, but I thought that linq might provide a more elegant solution...
Suppose I have an object...
public class dupeCheckee { public string checkThis { get; set; } public string checkThat { get; set; } dupeCheckee(string val, string val2) { checkThis = val; checkThat = val2; } }
And I have a list of those objects
List<dupeCheckee> dupList = new List<dupeCheckee>(); dupList.Add(new dupeCheckee("test1", "value1")); dupList.Add(new dupeCheckee("test2", "value1")); dupList.Add(new dupeCheckee("test3", "value1")); dupList.Add(new dupeCheckee("test1", "value1"));//dupe dupList.Add(new dupeCheckee("test2", "value1"));//dupe... dupList.Add(new dupeCheckee("test4", "value1")); dupList.Add(new dupeCheckee("test5", "value1")); dupList.Add(new dupeCheckee("test1", "value2"));//not dupe
I need to find the dupes in that list. When I find it, I need to do some additional logic not necessarily removing them.
When I use linq some how my GroupBy is throwing an exception...
'System.Collections.Generic.List<dupeCheckee>' does not contain a definition for 'GroupBy' and no extension method 'GroupBy' accepting a first argument of type 'System.Collections.Generic.List<dupeCheckee>' could be found (are you missing a using directive or an assembly reference?)
Which is telling me that I am missing a library. I am having a hard time figuring out which one though.
Once I figure that out though, How would I essentially check for those two conditions... IE checkThis and checkThat both occur more than once?
UPDATE: What I came up with
This is the linq query that I came up with after doing quick research...
test.Count != test.Select(c => new { c.checkThat, c.checkThis }).Distinct().Count()
I am not certain if this is definitely better than this answer...
var duplicates = test.GroupBy(x => new {x.checkThis, x.checkThat}) .Where(x => x.Skip(1).Any());
I know I can put the first statement into an if else clause. I also ran a quick test. The duplicates list gives me back 1 when I was expecting 0 but it did correctly call the fact that I had duplicates in one of the sets that I used...
The other methodology does exactly as I expect it to. Here are the data sets that I use to test this out....
Dupes:
List<DupeCheckee> test = new List<DupeCheckee>{ new DupeCheckee("test0", "test1"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test1", "test2"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test2", "test3"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test0", "test5"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test1", "test6"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test2", "test7"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test3", "test8"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test0", "test5"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test1", "test1"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test2", "test2"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test4", "test4"),//{ checkThis = "test", checkThat = "test1"} };
No dupes...
List<DupeCheckee> test2 = new List<DupeCheckee>{ new DupeCheckee("test0", "test1"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test1", "test2"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test2", "test3"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test4", "test5"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test5", "test6"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test6", "test7"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test7", "test8"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test8", "test5"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test9", "test1"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test2", "test2"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"} new DupeCheckee("test4", "test4"),//{ checkThis = "test", checkThat = "test1"} };
Place all items in a set and if the count of the set is different from the count of the list then there is a duplicate. Should be more efficient than Distinct as there is no need to go through all the list. Don't call list. Count() method.
One of the most common ways to find duplicates is by using the brute force method, which compares each element of the array to every other element. This solution has the time complexity of O(n^2) and only exists for academic purposes.
You need to reference System.Linq (e.g. using System.Linq
)
then you can do
var dupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat}) .Where(x => x.Skip(1).Any());
This will give you groups with all the duplicates
The test for duplicates would then be
var hasDupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat}) .Where(x => x.Skip(1).Any()).Any();
or even call ToList()
or ToArray()
to force the calculation of the result and then you can both check for dupes and examine them.
eg..
var dupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat}) .Where(x => x.Skip(1).Any()).ToArray(); if (dupes.Any()) { foreach (var dupeList in dupes) { Console.WriteLine(string.Format("checkThis={0},checkThat={1} has {2} duplicates", duplist.Key.checkThis, duplist.Key.checkThat, duplist.Count() - 1)); } }
Alternatively
var dupes = dupList.Select((x, i) => new { index = i, value = x}) .GroupBy(x => new {x.value.checkThis, x.value.checkThat}) .Where(x => x.Skip(1).Any());
Which give you the groups which each item per group stores the original index in a property index
and the item in the property value
There was huge amount of working solutions, but I think that next solution will be more transparent and easy to understand, then all above:
var hasDuplicatedEntries = ListWithPossibleDuplicates .GroupBy(YourGroupingExpression) .Any(e => e.Count() > 1); if(hasDuplicatedEntries) { // Do what ever you want in case when list contains duplicates }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With