I am validating a table which has a transaction level data of an eCommerce site and find the exact errors.
I want your help to find duplicate records in a 50 column table on SQL Server.
Suppose my data is:
OrderNo shoppername amountpayed city Item 1 Sam 10 A Iphone 1 Sam 10 A Iphone--->>Duplication to be detected 1 Sam 5 A Ipod 2 John 20 B Macbook 3 John 25 B Macbookair 4 Jack 5 A Ipod
Suppose I use the below query:
Select shoppername,count(*) as cnt from dbo.sales having count(*) > 1 group by shoppername
will return me
Sam 2 John 2
But I don't want to find duplicate just over 1 or 2 columns. I want to find the duplicate over all the columns together in my data. I want the result as:
1 Sam 10 A Iphone
One way to find duplicate records from the table is the GROUP BY statement. The GROUP BY statement in SQL is used to arrange identical data into groups with the help of some functions. i.e if a particular column has the same values in different rows then it will arrange these rows in a group.
To select duplicate values, you need to create groups of rows with the same values and then select the groups with counts greater than one. You can achieve that by using GROUP BY and a HAVING clause.
The query uses window version of COUNT aggregate function: the function is applied over Col1 partitions. The outer query filters out records which have a Col1 value that appears only once. Show activity on this post.
with x as (select *,rn = row_number() over(PARTITION BY OrderNo,item order by OrderNo) from #temp1) select * from x where rn > 1
you can remove duplicates by replacing select statement by
delete x where rn > 1
SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as cnt FROM dbo.sales GROUP BY OrderNo, shoppername, amountPayed, city, item HAVING COUNT(*) > 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With