I am looking for a little help with designing a query using C#/LINQ to meet the following requirements:
I have a list of companies:-
Id Name Email Address
1 Company A [email protected] abc
2 Company B [email protected] abc
3 Company C [email protected] abc
4 Company D [email protected] abc
5 Company A [email protected] abc
My goal is to detect duplicate items based on two fields, in this example 'name' and 'email'.
Desired output is a list of customers shown below:
Desired duplicate list:-
Id Qty Name Email Address
1 2 Company A [email protected] abc (Id/details of first)
2 1 Company B [email protected] abc
3 1 Company C [email protected] abc
4 1 Company D [email protected] abc
If you explicitly want to use the lowest-ID record in each set of duplicates, you could use
var duplicates = companies
.GroupBy(c => new { c.Name, c.Email })
.Select(g => new { Qty = g.Count(), First = g.OrderBy(c => c.Id).First() } )
.Select(p => new
{
Id = p.First.Id,
Qty = p.Qty,
Name = p.First.Name,
Email = p.First.Email,
Address = p.First.Address
});
If you don't care which record's values are used, or if your source is already sorted by ID (ascending), you can drop the OrderBy
call.
from c in companies
group c by new { c.Name, c.Email } into g
select new
{
Id = g.First().Id,
Qty = g.Count(),
Name = g.Key.Name,
Email = g.Key.Email,
Address = g.First().Address
};
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With