Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# 3.0: Need to return duplicates from a List<>

I have a List<> of objects in C# and I need a way to return those objects that are considered duplicates within the list. I do not need the Distinct resultset, I need a list of those items that I will be deleting from my repository.

For the sake of this example, lets say I have a list of "Car" types and I need to know which of these cars are the same color as another in the list. Here are the cars in the list and their color property:

Car1.Color = Red;

Car2.Color = Blue;

Car3.Color = Green;

Car4.Color = Red;

Car5.Color = Red;

For this example I need the result (IEnumerable<>, List<>, or whatever) to contain Car4 and Car5 because I want to delete these from my repository or db so that I only have one car per color in my repository. Any help would be appreciated.

like image 912
SaaS Developer Avatar asked Jan 29 '09 22:01

SaaS Developer


2 Answers

I inadvertently coded this yesterday, when I was trying to write a "distinct by a projection". I included a ! when I shouldn't have, but this time it's just right:

public static IEnumerable<TSource> DuplicatesBy<TSource, TKey>
    (this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
    HashSet<TKey> seenKeys = new HashSet<TKey>();
    foreach (TSource element in source)
    {
        // Yield it if the key hasn't actually been added - i.e. it
        // was already in the set
        if (!seenKeys.Add(keySelector(element)))
        {
            yield return element;
        }
    }
}

You'd then call it with:

var duplicates = cars.DuplicatesBy(car => car.Color);
like image 56
Jon Skeet Avatar answered Oct 21 '22 14:10

Jon Skeet


var duplicates = from car in cars
                 group car by car.Color into grouped
                 from car in grouped.Skip(1)
                 select car;

This groups the cars by color and then skips the first result from each group, returning the remainder from each group flattened into a single sequence.

If you have particular requirements about which one you want to keep, e.g. if the car has an Id property and you want to keep the car with the lowest Id, then you could add some ordering in there, e.g.

var duplicates = from car in cars
                 group car by car.Color into grouped
                 from car in grouped.OrderBy(c => c.Id).Skip(1)
                 select car;
like image 31
Greg Beech Avatar answered Oct 21 '22 12:10

Greg Beech