Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove duplicates from DataTable and custom IEqualityComparer<DataRow>

How have I to implement IEqualityComparer<DataRow> to remove duplicates rows from a DataTable with next structure:

ID primary key, col_1, col_2, col_3, col_4

The default comparer doesn't work because each row has it's own, unique primary key.

How to implement IEqualityComparer<DataRow> that will skip primary key and compare only data remained.

I have something like this:

public class DataRowComparer : IEqualityComparer<DataRow>
{
 public bool Equals(DataRow x, DataRow y)
 {
  return
   x.ItemArray.Except(new object[] { x[x.Table.PrimaryKey[0].ColumnName] }) ==
   y.ItemArray.Except(new object[] { y[y.Table.PrimaryKey[0].ColumnName] });
 }

 public int GetHashCode(DataRow obj)
 {
  return obj.ToString().GetHashCode();
 }
}

and

public static DataTable RemoveDuplicates(this DataTable table)
{
  return
    (table.Rows.Count > 0) ?
  table.AsEnumerable().Distinct(new DataRowComparer()).CopyToDataTable() :
  table;
}

but it calls only GetHashCode() and doesn't call Equals()

like image 533
abatishchev Avatar asked Oct 21 '09 08:10

abatishchev


1 Answers

That is the way Distinct works. Intenally it uses the GetHashCode method. You can write the GetHashCode to do what you need. Something like

public int GetHashCode(DataRow obj)
{
    var values = obj.ItemArray.Except(new object[] { obj[obj.Table.PrimaryKey[0].ColumnName] });
    int hash = 0;
    foreach (var value in values)
    {
        hash = (hash * 397) ^ value.GetHashCode();
    }
    return hash;
}

Since you know your data better you can probably come up with a better way to generate the hash.

like image 194
Mike Two Avatar answered Oct 03 '22 08:10

Mike Two