Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Query to detect duplicate rows

I had read data from an XML file into DataSet by using c# than I want to identify duplicate (completely the same ) rows in that set. I tried such kind of grouping and that works!

var d= from r1 in table.AsEnumerable()
       group r1 by new
       {
            t0 = r1[0],
            t1 = r1[1],
            t2 = r1[2],
            t3 = r1[3],
            t4 = r1[4],
            t5 = r1[5],
            t6 = r1[6],
            t7 = r1[7],
            t8 = r1[8],
       }
       into grp
       where grp.Count() > 1
       select grp;

But the number of data columns can be differ, so I cannot apply static grouping in query like above. I had to generate the grouping array dynamically?

I don't want to delete dublicate, I just want to find them!

like image 364
srcnaks Avatar asked Jan 28 '13 06:01

srcnaks


People also ask

How do I find duplicate rows in a data set?

DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.

What is a Find Duplicates query?

A find duplicates query allows you to search for and identify duplicate records within a table or tables. A duplicate record is a record that refers to the same thing or person as another record.


1 Answers

var rows = table.AsEnumerable();
var unique = rows.Distinct(DataRowComparer.Default);
var duplicates = rows.Except(unique); // , DataRowComparer.Default);
like image 132
abatishchev Avatar answered Sep 18 '22 11:09

abatishchev