Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using LINQ to find / delete duplicates

I have a table that contains a bunch of duplicates. These are exact duplicates, minus the primary key column, which is an integer identity column.

Using EF and LINQ, how do I find the duplicates and delete them, leaving only one copy.

I found the duplicates and a count of each using SQL and SSMS. I'm just don't know where to start with LINQ.

Thanks!

like image 652
DenaliHardtail Avatar asked Dec 21 '10 19:12

DenaliHardtail


2 Answers

Off the top of my head (untested):

var q = from r in Context.Table
        group r by new { FieldA = r.FieldA, FieldB = r.FieldB, // ...
            into g
        where g.Count() > 1
        select g;
foreach (var g in q)
{
    var dupes = g.Skip(1).ToList();
    foreach (var record in dupes)
    {
        Context.DeleteObject(record);
    }
}
Context.SaveChanges();
like image 107
Craig Stuntz Avatar answered Sep 27 '22 21:09

Craig Stuntz


Building on @Craig Stuntz's answer with a "one liner" alternative:

var duplicates = db.Table.GroupBy(a => new { a.FieldA, a.FieldB, ...})
                         .Where(a => a.Count() > 1)
                         .SelectMany(a => a.ToList());

foreach (var d in duplicates)
{
     db.DeleteObject(d);
}

db.SaveChanges();
like image 37
Jorge Rodrigues dos Santos Avatar answered Sep 27 '22 22:09

Jorge Rodrigues dos Santos