Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to mark duplicates in an SQL query

Tags:

sql

mysql

I have an SQL query which looks at date-of-birth, last name and a soundex of first name to identify duplicates. The following query finds some 8,000 rows (which I assume means there are around 8,000 duplicate records).

select dob,last_name,soundex(first_name),count(*)
from clients
group by dob,last_name,soundex(first_name)
having count(*) >1

Almost all of the results have a count of 2, a few have a count of 3 where obviously the record existed twice in one of the two databases which were merged.

The next step I need to take is to mark one of the rows, doesn't really matter, with a duplicate flag and to mark each row with the opposite rows key. Is there a way of doing this using SQL?

like image 712
Darren Avatar asked Jan 23 '11 21:01

Darren


2 Answers

This should do what you are after, the UPDATE in one go.

UPDATE FROM clients c
INNER JOIN
(
  select dob,last_name,soundex(first_name),MIN(id) as keep
  from clients
  group by dob,last_name,soundex(first_name)
  having count(*) >1
) k
ON c.dob=k.dob AND c.last_name=k.last_name AND soundex(c.first_name)=soundex(k.first_name)
SET duplicateid = NULLIF(k.keep, c.id),
    hasduplicate = (k.keep = c.id)

It assumes you have 3 columns not stated in the question

  • id: primary key
  • duplicateid: points to the dup being kept
  • hasduplicate: boolean, marks the one to keep
like image 161
RichardTheKiwi Avatar answered Sep 25 '22 16:09

RichardTheKiwi


Well, you could use SELECT DISTINCT, and then mark a single row as "not duplicate" -- then search for rows that are "not duplicate" to find the duplicate.

like image 23
Jason LeBrun Avatar answered Sep 22 '22 16:09

Jason LeBrun