Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# How to filter a list and remove duplicates?

Tags:

c#

.net

linq

I have a List of Type X. This contains fields and I need to return only unique records from the list. I need to use one of the field/property (OIndex) that contains a timestamp and filter it using that property. List is like this:

> 2c55-Checked-branchDeb-20160501121315-05
> 2c60-Checked-branchDeb-20160506121315-06
> 2c55-Checked-branchDeb-20160601121315-07
> 2c55-Checked-branchDeb-20160601141315-07
> 2c60-Checked-branchDeb-20160720121315-08

In the example above the last field is the recordId so we have a duplicate record of "07". The timestamp is field four. So I want to get the all the records except that 3rd which is a duplicate. The latest version of record "07" is the fourth line.

I started doing the code but struggling. So far:

List<X> originalRecords = GetSomeMethod(); //this method returns our list above

var duplicateKeys = originalRecords.GroupBy(x => x.Record)  //x.Record is the record as shown above "05", "06" etc
                        .Where(g => g.Count() > 1)
                        .Select(y => y.Key);

What do I do now? Now that I have the duplicate keys. I think I need to go through the OriginalRecords list again and see if it contains the duplicate key. And then use substring on the datetime. Store this somewhere and then remove the record which is not the latest. And save the original records with the filter. Thanks

like image 565
user2906420 Avatar asked Dec 25 '22 02:12

user2906420


1 Answers

You don't need to find duplicate keys explicitly, you could simply select first from each group:

var res == originalRecords
    .GroupBy(x => x.RecordId)
    .Select(g => g.OrderByDescending(x => x.DateTimeField).First());

There is no field for datetimefield as in your code. I simply have a string field which contains the datetime together with other data. The record however has a Record Id field.

You can split your records on a dash, grab the date-time portion, and sort on it. Your date/time is in a format that lets you sort lexicographically, so you can skip parsing the date.

Assuming that there are no dashes, and that all strings are formatted in the same way, x.TextString.Split('-')[3] expression will give you the timestamp portion of your record:

var res == originalRecords
    .GroupBy(x => x.RecordId)
    .Select(g => g.OrderByDescending(x => x.TextString.Split('-')[3]).First());
like image 106
Sergey Kalinichenko Avatar answered Jan 27 '23 23:01

Sergey Kalinichenko