Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding how JOIN works when 3 or more tables are involved. [SQL]

Tags:

sql

join

I wonder if anyone can help improve my understanding of JOINs in SQL. [If it is significant to the problem, I am thinking MS SQL Server specifically.]

Take 3 tables A, B [A related to B by some A.AId], and C [B related to C by some B.BId]

If I compose a query e.g

SELECT * FROM A JOIN B  ON A.AId = B.AId 

All good - I'm sweet with how this works.

What happens when Table C (Or some other D,E, .... gets added)

In the situation

SELECT * FROM A JOIN B    ON A.AId = B.AId JOIN C ON C.BId = B.BId 

What is C joining to? - is it that B table (and the values therein)? Or is it some other temporary result set that is the result of the A+B Join that the C table is joined to?

[The implication being not all values that are in the B table will necessarily be in the temporary result set A+B based on the join condition for A,B]

A specific (and fairly contrived) example of why I am asking is because I am trying to understand behaviour I am seeing in the following:

Tables  Account (AccountId, AccountBalanceDate, OpeningBalanceId, ClosingBalanceId) Balance (BalanceId) BalanceToken (BalanceId, TokenAmount)  Where: Account->Opening, and Closing Balances are NULLABLE  (may have opening balance, closing balance, or none)  Balance->BalanceToken is 1:m - a balance could consist of many tokens 

Conceptually, Closing Balance of a date, would be tomorrows opening balance

If I was trying to find a list of all the opening and closing balances for an account

I might do something like

SELECT AccountId , AccountBalanceDate , Sum (openingBalanceAmounts.TokenAmount) AS OpeningBalance , Sum (closingBalanceAmounts.TokenAmount) AS ClosingBalance FROM Account A     LEFT JOIN BALANCE OpeningBal        ON A.OpeningBalanceId = OpeningBal.BalanceId    LEFT JOIN BALANCE ClosingBal        ON A.ClosingBalanceId = ClosingBal.BalanceId    LEFT JOIN BalanceToken openingBalanceAmounts        ON openingBalanceAmounts.BalanceId = OpeningBal.BalanceId    LEFT JOIN BalanceToken closingBalanceAmounts        ON closingBalanceAmounts.BalanceId = ClosingBal.BalanceId    GROUP BY AccountId, AccountBalanceDate   

Things work as I would expect until the last JOIN brings in the closing balance tokens - where I end up with duplicates in the result.

[I can fix with a DISTINCT - but I am trying to understand why what is happening is happening]

I have been told the problem is because the relationship between Balance, and BalanceToken is 1:M - and that when I bring in the last JOIN I am getting duplicates because the 3rd JOIN has already brought in BalanceIds multiple times into the (I assume) temporary result set.

I know that the example tables do not conform to good DB design

Apologies for the essay, thanks for any elightenment :)

Edit in response to question by Marc

Conceptually for an account there should not be duplicates in BalanceToken for An Account (per AccountingDate) - I think the problem comes about because 1 Account / AccountingDates closing balance is that Accounts opening balance for the next day - so when self joining to Balance, BalanceToken multiple times to get opening and closing balances I think Balances (BalanceId's) are being brought into the 'result mix' multiple times. If it helps to clarify the second example, think of it as a daily reconciliation - hence left joins - an opening (and/or) closing balance may not have been calculated for a given account / accountingdate combination.

like image 553
Delaney Avatar asked Jul 05 '09 08:07

Delaney


People also ask

How does 3 table join work?

In this case the two tables are joined using the relationship table1.id = table2.id . It is possible to use multiple join statements together to join more than one table at the same time. To do that you add a second INNER JOIN statement and a second ON statement to indicate the third table and the second relationship.

How do you connect 3 tables using joins?

Inner Join with Three Tables In this example we use all three of the preceding tables; table1, Table2 and table3 and adding it using an Inner Join. The output will be displayed as a single table which satisfies the join conditions.


1 Answers

Conceptually here is what happens when you join three tables together.

  1. The optimizer comes up with a plan, which includes a join order. It could be A, B, C, or C, B, A or any of the combinations
  2. The query execution engine applies any predicates (WHERE clause) to the first table that doesn't involve any of the other tables. It selects out the columns mentioned in the JOIN conditions or the SELECT list or the ORDER BY list. Call this result A
  3. It joins this result set to the second table. For each row it joins to the second table, applying any predicates that may apply to the second table. This results in another temporary resultset.
  4. Then it joins in the final table and applies the ORDER BY

This is conceptually what happens. Infact there are many possible optimizations along the way. The advantage of the relational model is that the sound mathematical basis makes various transformations of plan possible while not changing the correctness.

For example, there is really no need to generate the full result sets along the way. The ORDER BY may instead be done via accessing the data using an index in the first place. There are lots of types of joins that can be done as well.

like image 105
WW. Avatar answered Sep 29 '22 01:09

WW.