Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL to remove rows with duplicated value while keeping one

Say I have this table

id | data | value
-----------------
 1 |   a  |   A
 2 |   a  |   A
 3 |   a  |   A
 4 |   a  |   B
 5 |   b  |   C
 6 |   c  |   A
 7 |   c  |   C
 8 |   c  |   C

I want to remove those rows with duplicated value for each data while keeping the one with the min id, e.g. the result will be

id | data | value
-----------------
 1 |   a  |   A
 4 |   a  |   B
 5 |   b  |   C
 6 |   c  |   A
 7 |   c  |   C

I know a way to do it is to do a union like:

SELECT 1 [id], 'a' [data], 'A' [value] INTO #test UNION SELECT 2, 'a', 'A'
UNION SELECT 3, 'a', 'A' UNION SELECT 4, 'a', 'B'
UNION SELECT 5, 'b', 'C' UNION SELECT 6, 'c', 'A'
UNION SELECT 7, 'c', 'C' UNION SELECT 8, 'c', 'C'

SELECT * FROM #test WHERE id NOT IN (
    SELECT MIN(id) FROM #test
    GROUP BY [data], [value]
    HAVING COUNT(1) > 1
    UNION
    SELECT MIN(id) FROM #test
    GROUP BY [data], [value]
    HAVING COUNT(1) <= 1
)

but this solution has to repeat the same group by twice (consider the real case is a massive group by with > 20 columns)

I would prefer a simpler answer with less code as oppose to complex ones. Is there any more concise way to code this?

Thank you

like image 944
user1589188 Avatar asked May 06 '14 13:05

user1589188


2 Answers

You can use one of the methods below:

  1. Using WITH CTE:

     WITH CTE AS 
     (SELECT *,RN=ROW_NUMBER() OVER(PARTITION BY data,value ORDER BY id) 
      FROM TableName)
     DELETE FROM CTE WHERE RN>1
    

Explanation:

This query will select the contents of the table along with a row number RN. And then delete the records with RN >1 (which would be the duplicates).

This Fiddle shows the records which are going to be deleted using this method.

  1. Using NOT IN:

     DELETE FROM TableName
     WHERE id NOT IN
           (SELECT MIN(id) as id
            FROM TableName
            GROUP BY data,value)
    

Explanation:

With the given example, inner query will return ids (1,6,4,5,7). The outer query will delete records from table whose id NOT IN (1,6,4,5,7).

This fiddle shows the records which are going to be deleted using this method.

Suggestion: Use the first method since it is faster than the latter. Also, it manages to keep only one record if id field is also duplicated for the same data and value.

like image 165
Raging Bull Avatar answered Oct 16 '22 22:10

Raging Bull


I want to add MYSQL solution for this query

Suggestion 1 : MySQL prior to version 8.0 doesn't support the WITH clause

Suggestion 2 : throw this error (you can't specify table TableName for update in FROM clause

So the solution will be

DELETE FROM TableName WHERE id NOT IN
  (SELECT MIN(id) as id
   FROM (select * from TableName) as t1
   GROUP BY data,value) as t2;
like image 40
Choussamaster Avatar answered Oct 16 '22 22:10

Choussamaster