Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL query to find duplicate rows and return both IDs

Tags:

sql

mysql

I have a table of customers:

id | name | email
--------------------------
 1 | Rob  | [email protected]
 2 | Jim  | [email protected]
 3 | Dave | [email protected]
 4 | Fred | [email protected]
 5 | Ben  | [email protected]
 6 | Tom  | [email protected]

I'm trying to write an SQL query that returns all the rows with duplicate email addresses but... I'd like the query result to return the original ID and the duplicate ID. (The original ID is the first occurrence of the duplicate email.)

The desired result:

original_id | duplicate_id | email
-------------------------------------------
          1 |            2 | [email protected]
          3 |            5 | [email protected]
          3 |            6 | [email protected]

My research so far has indicated it might involve some kind of self join, but I'm stuck on the actual implementation. Can anyone help?

like image 494
Rob Stanford Avatar asked Mar 07 '26 04:03

Rob Stanford


2 Answers

We could handle this using a join, but I might actually go for an option which generates a CSV list of id corresponding to duplicates:

SELECT
    email,
    GROUP_CONCAT(id ORDER BY id) AS duplicate_ids
FROM yourTable
GROUP BY email
HAVING COUNT(*) > 1

Functionally speaking, this gives you the same information you wanted in your question, but in what is a much simplified form in my opinion. Because we order the id values when concatenating, the original id will always appear first, on the left side of the CSV list. Also, if you have many duplicates your requested output could become verbose and harder to read.

Output:

enter image description here

Demo

like image 180
Tim Biegeleisen Avatar answered Mar 09 '26 17:03

Tim Biegeleisen


select
  orig.original_id,
  t.id as duplicate_id,
  orig.email
from t
  inner join (select min(id) as original_id, email
              from t
              group by email
              having count(*)>1) orig on orig.email = t.email
having t.id!=orig.original_id

By the subquery we can find all ids for emails with duplicates.

Then we join the subquery by email and for each one use minimal id as original

UPDATE: http://rextester.com/BLIHK20984 cloned @Tim Biegeleisen's answer

like image 36
StanislavL Avatar answered Mar 09 '26 16:03

StanislavL



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!