Understanding how JOIN works when 3 or more tables are involved. [SQL]

Tags:

join

I wonder if anyone can help improve my understanding of JOINs in SQL. [If it is significant to the problem, I am thinking MS SQL Server specifically.]

Take 3 tables A, B [A related to B by some A.AId], and C [B related to C by some B.BId]

If I compose a query e.g

SELECT * FROM A JOIN B  ON A.AId = B.AId

All good - I'm sweet with how this works.

What happens when Table C (Or some other D,E, .... gets added)

In the situation

SELECT * FROM A JOIN B    ON A.AId = B.AId JOIN C ON C.BId = B.BId

What is C joining to? - is it that B table (and the values therein)? Or is it some other temporary result set that is the result of the A+B Join that the C table is joined to?

[The implication being not all values that are in the B table will necessarily be in the temporary result set A+B based on the join condition for A,B]

A specific (and fairly contrived) example of why I am asking is because I am trying to understand behaviour I am seeing in the following:

Tables  Account (AccountId, AccountBalanceDate, OpeningBalanceId, ClosingBalanceId) Balance (BalanceId) BalanceToken (BalanceId, TokenAmount)  Where: Account->Opening, and Closing Balances are NULLABLE  (may have opening balance, closing balance, or none)  Balance->BalanceToken is 1:m - a balance could consist of many tokens

Conceptually, Closing Balance of a date, would be tomorrows opening balance

If I was trying to find a list of all the opening and closing balances for an account

I might do something like

SELECT AccountId , AccountBalanceDate , Sum (openingBalanceAmounts.TokenAmount) AS OpeningBalance , Sum (closingBalanceAmounts.TokenAmount) AS ClosingBalance FROM Account A     LEFT JOIN BALANCE OpeningBal        ON A.OpeningBalanceId = OpeningBal.BalanceId    LEFT JOIN BALANCE ClosingBal        ON A.ClosingBalanceId = ClosingBal.BalanceId    LEFT JOIN BalanceToken openingBalanceAmounts        ON openingBalanceAmounts.BalanceId = OpeningBal.BalanceId    LEFT JOIN BalanceToken closingBalanceAmounts        ON closingBalanceAmounts.BalanceId = ClosingBal.BalanceId    GROUP BY AccountId, AccountBalanceDate

Things work as I would expect until the last JOIN brings in the closing balance tokens - where I end up with duplicates in the result.

[I can fix with a DISTINCT - but I am trying to understand why what is happening is happening]

I have been told the problem is because the relationship between Balance, and BalanceToken is 1:M - and that when I bring in the last JOIN I am getting duplicates because the 3rd JOIN has already brought in BalanceIds multiple times into the (I assume) temporary result set.

I know that the example tables do not conform to good DB design

Apologies for the essay, thanks for any elightenment :)

Edit in response to question by Marc

Conceptually for an account there should not be duplicates in BalanceToken for An Account (per AccountingDate) - I think the problem comes about because 1 Account / AccountingDates closing balance is that Accounts opening balance for the next day - so when self joining to Balance, BalanceToken multiple times to get opening and closing balances I think Balances (BalanceId's) are being brought into the 'result mix' multiple times. If it helps to clarify the second example, think of it as a daily reconciliation - hence left joins - an opening (and/or) closing balance may not have been calculated for a given account / accountingdate combination.

553

asked Jul 05 '09 08:07

Delaney

1 Answers

Conceptually here is what happens when you join three tables together.

The optimizer comes up with a plan, which includes a join order. It could be A, B, C, or C, B, A or any of the combinations
The query execution engine applies any predicates (WHERE clause) to the first table that doesn't involve any of the other tables. It selects out the columns mentioned in the JOIN conditions or the SELECT list or the ORDER BY list. Call this result A
It joins this result set to the second table. For each row it joins to the second table, applying any predicates that may apply to the second table. This results in another temporary resultset.
Then it joins in the final table and applies the ORDER BY

This is conceptually what happens. Infact there are many possible optimizations along the way. The advantage of the relational model is that the sound mathematical basis makes various transformations of plan possible while not changing the correctness.

For example, there is really no need to generate the full result sets along the way. The ORDER BY may instead be done via accessing the data using an index in the first place. There are lots of types of joins that can be done as well.

105

answered Sep 29 '22 01:09

WW.

Related questions
                            
                                Get the SQL query result without the table format
                            
                                Add a summary row with totals
                            
                                How to avoid error "aggregate functions are not allowed in WHERE"
                            
                                What is the difference between single quotes and double quotes in PostgreSQL?
                            
                                Postgres - How to check for an empty array
                            
                                ActiveRecord find_each combined with limit and order
                            
                                SQL: set existing column as Primary Key in MySQL
                            
                                Selecting distinct column values in SQLAlchemy/Elixir
                            
                                How to find which views are using a certain table in SQL Server (2008)?
                            
                                unwrap postgresql array into rows
                            
                                DateTime group by date and hour
                            
                                How do I find records that are not joined?
                            
                                How do you write a conditional in a MySQL select statement?
                            
                                Can I loop through a table variable in T-SQL?
                            
                                Get count of records affected by INSERT or UPDATE in PostgreSQL
                            
                                How to limit rows in PostgreSQL SELECT
                            
                                Inner Joining three tables
                            
                                How to execute Table valued function
                            
                                MySQL - How to count all rows per table in one query
                            
                                How to replace specific values in a oracle database column?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With