I have checked the whole site and googled on the net but was unable to find a simple solution to this problem.
I have a datatable which has about 20 columns and 10K rows. I need to remove the duplicate rows in this datatable based on 4 key columns. Doesn't .Net have a function which does this? The function closest to what I am looking for was datatable.DefaultView.ToTable(true, array of columns to display), But this function does a distinct on all the columns.
It would be great if someone could help me with this.
EDIT: I am sorry for not being clear on this. This datatable is being created by reading a CSV file and not from a DB. So using an SQL query is not an option.
If a table has duplicate rows, we can delete it by using the DELETE statement. In the case, we have a column, which is not the part of group used to evaluate the duplicate records in the table.
Sets, built-in functions, and iterative methods can be used to remove duplicates from a list.
Excel can remove duplicate values, using all columns or a subset to determine uniqueness of a row. Duplicates are simply removed, with no way to recover them later. Dataiku's Distinct recipe identifies and removes duplicate rows within a dataset.
You can use Linq to Datasets. Check this. Something like this:
// Fill the DataSet.
DataSet ds = new DataSet();
ds.Locale = CultureInfo.InvariantCulture;
FillDataSet(ds);
List<DataRow> rows = new List<DataRow>();
DataTable contact = ds.Tables["Contact"];
// Get 100 rows from the Contact table.
IEnumerable<DataRow> query = (from c in contact.AsEnumerable()
select c).Take(100);
DataTable contactsTableWith100Rows = query.CopyToDataTable();
// Add 100 rows to the list.
foreach (DataRow row in contactsTableWith100Rows.Rows)
rows.Add(row);
// Create duplicate rows by adding the same 100 rows to the list.
foreach (DataRow row in contactsTableWith100Rows.Rows)
rows.Add(row);
DataTable table =
System.Data.DataTableExtensions.CopyToDataTable<DataRow>(rows);
// Find the unique contacts in the table.
IEnumerable<DataRow> uniqueContacts =
table.AsEnumerable().Distinct(DataRowComparer.Default);
Console.WriteLine("Unique contacts:");
foreach (DataRow uniqueContact in uniqueContacts)
{
Console.WriteLine(uniqueContact.Field<Int32>("ContactID"));
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With