Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete Duplicate Rows in SQL

I have a table with unique id but duplicate row information.

I can find the rows with duplicates using this query

SELECT
    PersonAliasId, StartDateTime, GroupId, COUNT(*) as Count
FROM
    Attendance
GROUP BY
    PersonAliasId, StartDateTime, GroupId
HAVING
    COUNT(*) > 1

I can manually delete the rows while keeping the 1 I need with this query

Delete
From Attendance
Where Id IN(SELECT
    Id
FROM
    Attendance
Where PersonAliasId = 15
    and StartDateTime = '9/24/2017'
and GroupId = 1429
Order By ModifiedDateTIme Desc
Offset 1 Rows)

I am not versed in SQL enough to figure out how to use the rows in the first query to delete the duplicates leaving behind the most recent. There are over 3481 records returned by the first query to do this one by one manually.

How can I find the duplicate rows like the first query and delete all but the most recent like the second?

like image 260
Kevin Rutledge Avatar asked Jan 23 '18 16:01

Kevin Rutledge


People also ask

How do you delete duplicate rows in SQL?

SQL Delete Duplicate Rows using Group By and Having Clause According to Delete Duplicate Rows in SQL, for finding duplicate rows, you need to use the SQL GROUP BY clause. The COUNT function can be used to verify the occurrence of a row using the Group by clause, which groups data according to the given columns.

How do I remove duplicate rows in select query?

The go to solution for removing duplicate rows from your result sets is to include the distinct keyword in your select statement. It tells the query engine to remove duplicates to produce a result set in which every row is unique.


2 Answers

You can use a Common Table Expression to delete the duplicates:

WITH Cte AS(
    SELECT *,
        Rn = ROW_NUMBER() OVER(PARTITION BY PersonAliasId, StartDateTime, GroupId 
                                ORDER BY ModifiedDateTIme DESC)
    FROM Attendance
)
DELETE FROM Cte WHERE Rn > 1;

This will keep the most recent record for each PersonAliasId - StartDateTime - GroupId combination.

like image 96
Felix Pamittan Avatar answered Sep 22 '22 14:09

Felix Pamittan


Use the MAX aggregate function to identify the latest startdatetime for each group/person combination. Then delete records which do not have that latest time.

DELETE a
FROM attendance as a
INNER JOIN (  
   SELECT
        PersonAliasId, MAX(StartDateTime) AS LatestTime, GroupId,
    FROM
        Attendance
    GROUP BY
        PersonAliasId, GroupId
    HAVING
        COUNT(*) > 1
) as b
on a.personaliasid=b.personaliasid and a.groupid=b.groupid and a.startdatetime < b.latesttime
like image 33
Greg Viers Avatar answered Sep 20 '22 14:09

Greg Viers