Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete multiple duplicate rows in table

I have multiple groups of duplicates in one table (3 records for one, 2 for another, etc) - multiple rows where more than 1 exists.

Below is what I came up with to delete them, but I have to run the script for however many duplicates there are:

set rowcount 1
delete from Table
where code in (
  select code from Table 
  group by code
  having (count(code) > 1)
)
set rowcount 0

This works well to a degree. I need to run this for every group of duplicates, and then it only deletes 1 (which is all I need right now).

like image 955
Dan Avatar asked Oct 12 '10 17:10

Dan


People also ask

How can I delete duplicate rows in a table?

If a table has duplicate rows, we can delete it by using the DELETE statement. In the case, we have a column, which is not the part of group used to evaluate the duplicate records in the table.

How delete all duplicate rows in SQL?

RANK function to SQL delete duplicate rows We can use the SQL RANK function to remove the duplicate rows as well. SQL RANK function gives unique row ID for each row irrespective of the duplicate row. In the following query, we use a RANK function with the PARTITION BY clause.

Can you remove duplicates in a table?

Follow these steps: Select the range of cells, or ensure that the active cell is in a table. On the Data tab, click Remove Duplicates . In the Remove Duplicates dialog box, unselect any columns where you don't want to remove duplicate values.


3 Answers

If you have a key column on the table, then you can use this to uniquely identify the "distinct" rows in your table.

Just use a sub query to identify a list of ID's for unique rows and then delete everything outside of this set. Something along the lines of.....

create table #TempTable
(
    ID int identity(1,1) not null primary key,
    SomeData varchar(100) not null
)

insert into #TempTable(SomeData) values('someData1')
insert into #TempTable(SomeData) values('someData1')
insert into #TempTable(SomeData) values('someData2')
insert into #TempTable(SomeData) values('someData2')
insert into #TempTable(SomeData) values('someData2')
insert into #TempTable(SomeData) values('someData3')
insert into #TempTable(SomeData) values('someData4')

select * from #TempTable

--Records to be deleted
SELECT ID
FROM #TempTable
WHERE ID NOT IN
(
    select MAX(ID)
    from #TempTable
    group by SomeData
)

--Delete them
DELETE
FROM #TempTable
WHERE ID NOT IN
(
    select MAX(ID)
    from #TempTable
    group by SomeData
)

--Final Result Set
select * from #TempTable

drop table #TempTable;

Alternatively you could use a CTE for example:

WITH UniqueRecords AS
(
    select MAX(ID) AS ID
    from #TempTable
    group by SomeData
)
DELETE A
FROM #TempTable A
    LEFT outer join UniqueRecords B on
        A.ID = B.ID
WHERE B.ID IS NULL
like image 174
John Sansom Avatar answered Sep 22 '22 15:09

John Sansom


It is frequently more efficient to copy unique rows into temporary table,
drop source table, rename back temporary table.

I reused the definition and data of #TempTable, called here as SrcTable instead, since it is impossible to rename temporary table into a regular one)

create table SrcTable
(
    ID int identity(1,1) not null primary key,
    SomeData varchar(100) not null
)

insert into SrcTable(SomeData) values('someData1')
insert into SrcTable(SomeData) values('someData1')
insert into SrcTable(SomeData) values('someData2')
insert into SrcTable(SomeData) values('someData2')
insert into SrcTable(SomeData) values('someData2')
insert into SrcTable(SomeData) values('someData3')
insert into SrcTable(SomeData) values('someData4')

by John Sansom in previous answer

-- cloning "unique" part
SELECT * INTO TempTable 
FROM SrcTable --original table
WHERE id IN  
(SELECT MAX(id) AS ID
FROM SrcTable
GROUP BY SomeData);
GO;

DROP TABLE SrcTable
GO;

sys.sp_rename 'TempTable', 'SrcTable'
like image 25

You can alternatively use ROW_NUMBER() function to filter out duplicates

;WITH [CTE_DUPLICATES] AS 
(
SELECT RN = ROW_NUMBER() OVER (PARTITION BY SomeData ORDER BY SomeData)
FROM #TempTable
) 
DELETE FROM [CTE_DUPLICATES] WHERE RN > 1
like image 32
anivas Avatar answered Sep 22 '22 15:09

anivas