Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove all but 1 object in list based on grouping

Tags:

c#

.net

list

linq

I have a list of objects with multiple properties in it. Here is the object.

public class DataPoint
{
    private readonly string uniqueId;
    public DataPoint(string uid)
    {
        this.uniqueId = uid;
    }

    public string UniqueId
    {
        get
        {
            return this.uniqueId;
        }
    }

    public string ScannerID { get; set; }

    public DateTime ScanDate { get; set; }
}

Now in my code, I have a giant list of these, hundreds maybe a few thousand.

Each data point object belongs to some type of scanner, and has a scan date. I want to remove any data points that were scanned on the same day except for the last one for a given machine.

I tried using LINQ as follows but this did not work. I still have many duplicate data points.

this.allData = this.allData.GroupBy(g => g.ScannerID)
                   .Select(s => s.OrderByDescending(o => o.ScanDate))
                   .First()
                   .ToList();`

I need to group the data points by scanner ID, because there could be data points scanned on the same day but on a different machine. I only need the last data point for a day if there are multiple.

Edit for clarification - By last data point I mean the last scanned data point for a given scan date for a given machine. I hope that helps. So when grouping by scanner ID, I then tried to order by scan date and then only keep the last scan date for days with multiple scans.

Here is some test data for 2 machines:

Unique ID   Scanner ID      Scan Date
A1JN221169H07  49374    2003-02-21 15:12:53.000
A1JN22116BK08  49374    2003-02-21 15:14:08.000
A1JN22116DN09  49374    2003-02-21 15:15:23.000
A1JN22116FP0A  49374    2003-02-21 15:16:37.000 
A1JOA050U900J  80354    2004-10-05 10:53:24.000 
A1JOA050UB30K  80354    2004-10-05 10:54:39.000 
A1JOA050UD60L  80354    2004-10-05 10:55:54.000 
A1JOA050UF80M  80354    2004-10-05 10:57:08.000 
A1JOA0600O202  80354    2004-10-06 08:38:26.000 
like image 848
RXC Avatar asked Sep 25 '15 12:09

RXC


2 Answers

I want to remove any data points that were scanned on the same day except for the last one for a given machine.

So I assume you want to group by both ScanDate and ScannerID. Here is the code:

var result = dataPoints.GroupBy(i => new { i.ScanDate.Date, i.ScannerID })
                       .OrderByDescending(i => i.Key.Date)
                       .Select(i => i.First())
                       .ToList();
like image 173
Hossein Narimani Rad Avatar answered Oct 03 '22 05:10

Hossein Narimani Rad


If I understand you correctly this is what you want.

var result = dataPoints.GroupBy(i => new { i.ScanDate.Date, i.ScannerID })
                       .Select(i => i.OrderBy(x => x.ScanDate).Last())
                       .ToList();

This groups by the scanner id and the day (SacnnerDate.Date will zero out the time portion), then for each grouping it orders by the ScanDate (since the groups are the same day this will order on the time) and takes the last. So for each day you will get one result for each scanner which has the latest ScanDate for that particular day.

like image 38
juharr Avatar answered Oct 03 '22 05:10

juharr