Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing duplicates from SQL Join

Tags:

The following is a hypothetical situation this which is close to my real problem. Table1

recid   firstname    lastname   company 1       A             B          AAA 2       D             E          DEF 3       G             H          IJK 4       A             B          ABC 

I have a table2 that looks like this

recid   firstname    lastname   company 10      A             B          ABC 20      D             E          DEF 30      M             D          DIM 40      A             B          CCC 

Now if I join the table on recid, it will give 0 results, there will be no duplicates because recid is unique. But if I join on firstname and lastname columns, which are not unique and there are duplicates, I get duplicates on inner join. The more columns I add on join, the worse it becomes (more duplicates are created).

In the above simple situation, how can I remove duplicates in the following query? I want to compare firstname and lastname, if they match, I return firstname, lastname and recid from table2

SELECT DISTINCT * FROM (SELECT recid, first, last FROM table1) a INNER JOIN (SELECT recid, first,last FROM table2) b ON a.first = b.first 

The script is here if anyone wants to play with it in future

CREATE TABLE table1 (recid INT NOT NULL PRIMARY KEY, first varchar(20), last varchar(20), company varchar(20)) CREATE TABLE table2 (recid INT NOT NULL PRIMARY KEY, first varchar(20), last varchar(20), company varchar(20))  INSERT INTO TABLE1 VALUES(1,'A','B','ABC') INSERT INTO TABLE1 VALUES(2,'D','E','DEF') INSERT INTO TABLE1 VALUES(3,'M','N','MNO') INSERT INTO TABLE1 VALUES(4,'A','B','ABC')  INSERT INTO TABLE2 VALUES(10,'A','B','ABC') INSERT INTO TABLE2 VALUES(20,'D','E','DEF') INSERT INTO TABLE2 VALUES(30,'Q','R','QRS') INSERT INTO TABLE2 VALUES(40,'A','B','ABC') 
like image 761
Hammad Khan Avatar asked Aug 16 '11 20:08

Hammad Khan


People also ask

How remove duplicates in SQL join?

Solution. Select column values in a specific order within rows to make rows with duplicate sets of values identical. Then you can use SELECT DISTINCT to remove duplicates. Alternatively, retrieve rows in such a way that near-duplicates are not even selected.

How do you find duplicates in SQL using joins?

Check for Duplicates in Multiple Tables With INNER JOINUse the INNER JOIN function to find duplicates that exist in multiple tables. Sample syntax for an INNER JOIN function looks like this: SELECT column_name FROM table1 INNER JOIN table2 ON table1. column_name = table2.

How do you avoid duplicates in inner join?

Solution 1 It is very common, therefore, to return few than all of your rows - especially with so many joins, each having the potential to eliminate some rows. You should use LEFT JOIN or RIGHT JOIN in appropriate locations, instead, which will return a NULL for unmatched records - but will return all of them.


1 Answers

You don't want to do a join per se, you're merely testing for existence/set inclusion.

I don't know what current flavor of SQL you're coding in, but this should work.

SELECT MAX(recid), firstname, lastname  FROM table2 T2 WHERE EXISTS (SELECT * FROM table1 WHERE firstname = T2.firstame AND lastname = T2.lastname) GROUP BY lastname, firstname 

If you want to implement as a join, leaving the code largely the same:

i.e.

SELECT max(t2.recid), t2.firstame, t2.lastname  FROM Table2 T2  INNER JOIN Table1 T1      ON T2.firstname = t1.firstname and t2.lastname = t1.lastname GROUP BY t2.firstname, t2.lastname  

Depending on the DBMS, an inner join may be implemented differently to an Exists (semi-join vs join) but the optimizer can sometimes figure it out anyway and chose the correct operator regardless of which way you write it.

like image 99
Code Magician Avatar answered Oct 11 '22 06:10

Code Magician