Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing SQL Table to itself (Self-join)

I'm trying to find duplicate rows based on mixed columns. This is an example of what I have:

CREATE TABLE Test
(
   id INT PRIMARY KEY,
   test1 varchar(124),
   test2 varchar(124)
)

INSERT INTO TEST ( id, test1, test2 ) VALUES ( 1, 'A', 'B' )
INSERT INTO TEST ( id, test1, test2 ) VALUES ( 2, 'B', 'C' )

Now if I run this query:

SELECT [LEFT].[ID] 
FROM [TEST] AS [LEFT] 
   INNER JOIN [TEST] AS [RIGHT] 
   ON [LEFT].[ID] != [RIGHT].[ID] 
WHERE [LEFT].[TEST1] = [RIGHT].[TEST2]

I would expect to get back both id's. (1 and 2), however I only ever get back the one row.

My thoughts would be that it should compare each row, but I guess this is not correct? To fix this I had changed my query to be:

SELECT [LEFT].[ID] 
FROM [TEST] AS [LEFT] 
   INNER JOIN [TEST] AS [RIGHT] 
   ON [LEFT].[ID] != [RIGHT].[ID] 
WHERE [LEFT].[TEST1] = [RIGHT].[TEST2] 
OR [LEFT].[TEST2] = [RIGHT].[TEST1]

Which gives me both rows, but the performance degrades extremely quickly based on the number of rows.

The final solution I came up for for performance and results was to use a union:

SELECT [LEFT].[ID] 
FROM [TEST] AS [LEFT] 
   INNER JOIN [TEST] AS [RIGHT] 
   ON [LEFT].[ID] != [RIGHT].[ID] 
WHERE [LEFT].[TEST1] = [RIGHT].[TEST2] 
UNION
SELECT [LEFT].[ID] 
FROM [TEST] AS [LEFT] 
   INNER JOIN [TEST] AS [RIGHT] 
   ON [LEFT].[ID] != [RIGHT].[ID] 
WHERE [LEFT].[TEST2] = [RIGHT].[TEST1]

But overall, I'm obviously missing an understanding of why this is not working which means that I'm probably doing something wrong. Could someone point me in the proper direction?

like image 969
Kyle Avatar asked Dec 11 '09 19:12

Kyle


People also ask

What is self join explain self join with example using SQL?

The SELF JOIN in SQL, as its name implies, is used to join a table to itself. This means that each row in a table is joined to itself and every other row in that table. However, referencing the same table more than once within a single query will result in an error. To avoid this, SQL SELF JOIN aliases are used.

What is the difference between self join and inner join?

An inner join (sometimes called a simple join) is a join of two or more tables that returns only those rows that satisfy the join condition. A self join is a join of a table to itself. This table appears twice in the FROM clause and is followed by table aliases that qualify column names in the join condition.

What is the advantage of self join in SQL?

A self join in SQL helps us in joining data given in the same table. In simpler words, a self join would join a table with itself. Thus, the records taken from the given table are matched and compared to other records from the very same table in the case of self join in SQL.

Can self join be implemented with two different tables having same data?

You can join different tables by their common columns using the JOIN keyword. It is also possible to join a table to itself, which is known as a self join.


2 Answers

Do not JOIN on an inequality; it seems that the JOIN and WHERE conditions are inverted.

SELECT t1.id
FROM Test t1
INNER JOIN Test t2
ON ((t1.test1 = t2.test2) OR (t1.test2 = t2.test1))
WHERE t1.id <> t2.id

Should work fine.

like image 63
Aaronaught Avatar answered Sep 21 '22 14:09

Aaronaught


You only get back both id's if you select them:

SELECT [LEFT].[ID], [RIGHT].[ID] 
FROM [TEST] AS [LEFT] 
   INNER JOIN [TEST] AS [RIGHT] 
   ON [LEFT].[ID] != [RIGHT].[ID] 
WHERE [LEFT].[TEST1] = [RIGHT].[TEST2]

The reason that only get one ROW is that only one row (namely row #2) has a TEST1 that is equal to another row's TEST2.

like image 24
Klaus Byskov Pedersen Avatar answered Sep 24 '22 14:09

Klaus Byskov Pedersen