Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Duplicated results when performing INNER JOIN

I have 2 simple tables that I would like to perform an INNER JOIN with, but the problem is that I'm getting duplicated (for the columns str1 and str2) results:

CREATE TABLE #A (Id INT, str1 nvarchar(50), str2 nvarchar(50))
insert into #A  values (1, 'a', 'b')
insert into #A  values (2, 'a', 'b')

CREATE TABLE #B (Id INT, str1 nvarchar(50), str2 nvarchar(50))
insert into #B values (7, 'a', 'b')
insert into #B  values (8, 'a', 'b')

select * from #A a
INNER JOIN #B b ON a.str1 = b.str1 AND a.str2 = b.str2  

It gave me 4 records when I really wanted 2.

What I got:
id | str1 | str2| id | str1 | str2
1 | a | b | 7 | a | b
2 | a | b | 7 | a | b
1 | a | b | 8 | a | b
2 | a | b | 8 | a | b

What I really wanted:
1 a | b | 7 | a | b
2 a | b | 8 | a | b

Can anyone help? I know this is achievable using a cursor and loop, but I'd like to avoid it and only use some type of JOIN if possible.

like image 334
codetc Avatar asked Jan 14 '23 22:01

codetc


2 Answers

SELECT 
    a.id AS a_id, a.str1 AS a_str1, a.str2 AS a_str2, 
    b.id AS b_id, b.str1 AS b_str1, b.str2 AS b_str2
FROM 
    ( SELECT *
           , ROW_NUMBER() OVER (PARTITION BY str1, str2 ORDER BY id) AS rn
      FROM #A 
    ) a
  INNER JOIN 
    ( SELECT *
           , ROW_NUMBER() OVER (PARTITION BY str1, str2 ORDER BY id) AS rn
      FROM #B 
    ) b 
    ON  a.str1 = b.str1 
    AND a.str2 = b.str2 
    AND a.rn = b.rn ;

If you have more rows in one or the other tables for the same (str1, str2) combination, you can choose which ones will be returned by changing INNER join to either LEFT, RIGHT or FULL join.

like image 176
ypercubeᵀᴹ Avatar answered Jan 22 '23 15:01

ypercubeᵀᴹ


You can accomplish a sort of matching with a query like the following (SQL 2005 and up):

WITH A AS (
   SELECT
      Seq = Row_Number() OVER (PARTITION BY Str1, Str2 ORDER BY Id),
      *
   FROM #A
), B AS (
   SELECT
      Seq = Row_Number() OVER (PARTITION BY Str1, Str2 ORDER BY Id),
      *
   FROM #B
)
SELECT
   A.Id, A.Str1, A.Str2, B.Id, B.Str1, B.Str2
FROM
   A
   FULL JOIN B
      ON A.Seq = B.Seq AND A.Str1 = B.Str1 AND A.Str2 = B.Str2;

This joins the items between A and B on their Id-ordered position. But take note: if you have an unequal number of items for each set of Str1 and Str2, you may get unexpected results, since NULLs will appear for #A or #B.

I'm assuming here that you want the first row of table #A's "Str1 Str2", as ordered by #A.Id (1 being first), to correlate with the first row of table #B's "Str1 Str2", as ordered by #B.Id (7 being first), and so on and so forth for each successively numbered row. Is that right?

But what will you do if the number of rows does not match, and there are, for example, 3 rows in #A that have the same values as 2 rows in #B? Or the reverse? What do you want to see?

A mere DISTINCT will not do the job because the data is not duplicated. You are getting what is in effect a partial cross-join (resulting in a partial Cartesian product). That is, your join criteria do not ensure that there is a one-to-one correspondence of #A row to #B row. When that happens, for each row in #A, you will get an output row for each matching row in B. 2 x 2 = 4, not 2.

I think it would help if you were to be a little more concrete in your example. What things are you actually querying? Surely you've simplified for us, but that has also removed all context for us to know what you're trying to accomplish in the real world. If you are trying to line up sports teams, we might give a different answer than if you are trying to line up invoice line items or tardy occurrences or who knows what!

like image 31
ErikE Avatar answered Jan 22 '23 14:01

ErikE