I am playing with LINQ to learn about it, but I can't figure out how to use Distinct
when I do not have a simple list (a simple list of integers is pretty easy to do, this is not the question). What I if want to use Distinct on a list of an Object on one or more properties of the object?
Example: If an object is Person
, with Property Id
. How can I get all Person and use Distinct
on them with the property Id
of the object?
Person1: Id=1, Name="Test1" Person2: Id=1, Name="Test1" Person3: Id=2, Name="Test2"
How can I get just Person1
and Person3
? Is that possible?
If it's not possible with LINQ, what would be the best way to have a list of Person
depending on some of its properties in .NET 3.5?
distinct in Linq to get result based on one field of the table (so do not require a whole duplicated records from table). I know writing basic query using distinct as followed: var query = (from r in table1 orderby r. Text select r).
LINQ Distinct is not that smart when it comes to custom objects. All it does is look at your list and see that it has two different objects (it doesn't care that they have the same values for the member fields). One workaround is to implement the IEquatable interface as shown here.
C# Linq Distinct() method removes the duplicate elements from a sequence (list) and returns the distinct elements from a single data source. It comes under the Set operators' category in LINQ query operators, and the method works the same way as the DISTINCT directive in Structured Query Language (SQL).
What if I want to obtain a distinct list based on one or more properties?
Simple! You want to group them and pick a winner out of the group.
List<Person> distinctPeople = allPeople .GroupBy(p => p.PersonId) .Select(g => g.First()) .ToList();
If you want to define groups on multiple properties, here's how:
List<Person> distinctPeople = allPeople .GroupBy(p => new {p.PersonId, p.FavoriteColor} ) .Select(g => g.First()) .ToList();
Note: Certain query providers are unable to resolve that each group must have at least one element, and that First is the appropriate method to call in that situation. If you find yourself working with such a query provider, FirstOrDefault may help get your query through the query provider.
Note2: Consider this answer for an EF Core (prior to EF Core 6) compatible approach. https://stackoverflow.com/a/66529949/8155
EDIT: This is now part of MoreLINQ.
What you need is a "distinct-by" effectively. I don't believe it's part of LINQ as it stands, although it's fairly easy to write:
public static IEnumerable<TSource> DistinctBy<TSource, TKey> (this IEnumerable<TSource> source, Func<TSource, TKey> keySelector) { HashSet<TKey> seenKeys = new HashSet<TKey>(); foreach (TSource element in source) { if (seenKeys.Add(keySelector(element))) { yield return element; } } }
So to find the distinct values using just the Id
property, you could use:
var query = people.DistinctBy(p => p.Id);
And to use multiple properties, you can use anonymous types, which implement equality appropriately:
var query = people.DistinctBy(p => new { p.Id, p.Name });
Untested, but it should work (and it now at least compiles).
It assumes the default comparer for the keys though - if you want to pass in an equality comparer, just pass it on to the HashSet
constructor.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With