How to mark duplicates in an SQL query

Question

I have an SQL query which looks at date-of-birth, last name and a soundex of first name to identify duplicates. The following query finds some 8,000 rows (which I assume means there are around 8,000 duplicate records).

select dob,last_name,soundex(first_name),count(*)
from clients
group by dob,last_name,soundex(first_name)
having count(*) >1

Almost all of the results have a count of 2, a few have a count of 3 where obviously the record existed twice in one of the two databases which were merged.

The next step I need to take is to mark one of the rows, doesn't really matter, with a duplicate flag and to mark each row with the opposite rows key. Is there a way of doing this using SQL?

RichardTheKiwi · Accepted Answer

This should do what you are after, the UPDATE in one go.

UPDATE FROM clients c
INNER JOIN
(
  select dob,last_name,soundex(first_name),MIN(id) as keep
  from clients
  group by dob,last_name,soundex(first_name)
  having count(*) >1
) k
ON c.dob=k.dob AND c.last_name=k.last_name AND soundex(c.first_name)=soundex(k.first_name)
SET duplicateid = NULLIF(k.keep, c.id),
    hasduplicate = (k.keep = c.id)

It assumes you have 3 columns not stated in the question

id: primary key
duplicateid: points to the dup being kept
hasduplicate: boolean, marks the one to keep

Jason LeBrun · Answer

Well, you could use SELECT DISTINCT, and then mark a single row as "not duplicate" -- then search for rows that are "not duplicate" to find the duplicate.

How to mark duplicates in an SQL query

Tags:

sql

mysql

Darren

2 Answers

RichardTheKiwi

Jason LeBrun

Recent Activity

Donate For Us

How to mark duplicates in an SQL query

Tags:

sql

mysql

Darren

2 Answers

RichardTheKiwi

Jason LeBrun

Related questions

Recent Activity

Donate For Us