Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find object data duplicates in List of objects

Tags:

c#

.net

Using c# 3 and .Net Framework 3.5, I have a Person object

public Person
{
    public int Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public int SSN { get; set; }
}

and I've got a List of them:

List<Person> persons = GetPersons();

How can I get all the Person objects in persons where SSN is not unique in the list and remove them from the persons list and ideally add them to another list called "List<Person> dupes"?

The original list might look something like this:

persons = new List<Person>();
persons.Add(new Person { Id = 1, 
                         FirstName = "Chris", 
                         LastName="Columbus", 
                         SSN=111223333 }); // Is a dupe
persons.Add(new Person { Id = 1, 
                         FirstName = "E.E.", 
                         LastName="Cummings", 
                         SSN=987654321 });
persons.Add(new Person { Id = 1, 
                         FirstName = "John", 
                         LastName="Steinbeck", 
                         SSN=111223333 }); // Is a dupe
persons.Add(new Person { Id = 1, 
                         FirstName = "Yogi", 
                         LastName="Berra", 
                         SSN=123456789 }); 

And the end result would have Cummings and Berra in the original persons list and would have Columbus and Steinbeck in a list called dupes.

Many thanks!

like image 756
Chris Conway Avatar asked Mar 06 '09 17:03

Chris Conway


People also ask

How do you find duplicates in array of objects?

Using the indexOf() method In this method, what we do is that we compare the index of all the items of an array with the index of the first time that number occurs. If they don't match, that implies that the element is a duplicate. All such elements are returned in a separate array using the filter() method.

How do you find duplicates in a Set of data?

If you want to identify duplicates across the entire data set, then select the entire set. Navigate to the Home tab and select the Conditional Formatting button. In the Conditional Formatting menu, select Highlight Cells Rules. In the menu that pops up, select Duplicate Values.


3 Answers

This gets you the duplicated SSN:

var duplicatedSSN =
    from p in persons
    group p by p.SSN into g
    where g.Count() > 1
    select g.Key;

The duplicated list would be like:

var duplicated = persons.FindAll( p => duplicatedSSN.Contains(p.SSN) );

And then just iterate over the duplicates and remove them.

duplicated.ForEach( dup => persons.Remove(dup) ); 
like image 198
gcores Avatar answered Oct 25 '22 03:10

gcores


Based on the recommendation by @gcores above.

If you want to add a single object of the duplicated SSN back to the list of persons, then add the following line:

IEnumerable<IGrouping<string, Person>> query = duplicated.GroupBy(d => d.SSN, d => d);

        foreach (IGrouping<string, Person> duplicateGroup in query)
        {
            persons.Add(duplicateGroup .First());
        }

My assumption here is that you may only want to remove duplicate values minus the original value that the duplicates derived from.

like image 44
Peter Ombwa Avatar answered Oct 25 '22 02:10

Peter Ombwa


Thanks to gcores for getting me started down a correct path. Here's what I ended up doing:

var duplicatedSSN =
    from p in persons
    group p by p.SSN into g
    where g.Count() > 1
    select g.Key;

var duplicates = new List<Person>();

foreach (var dupeSSN in duplicatedSSN)
{
    foreach (var person in persons.FindAll(p => p.SSN == dupeSSN))
        duplicates.Add(person);
}

duplicates.ForEach(dup => persons.Remove(dup));
like image 42
Chris Conway Avatar answered Oct 25 '22 04:10

Chris Conway