I wonder if anyone can help improve my understanding of JOINs in SQL. [If it is significant to the problem, I am thinking MS SQL Server specifically.]
Take 3 tables A, B [A related to B by some A.AId], and C [B related to C by some B.BId]
If I compose a query e.g
SELECT * FROM A JOIN B ON A.AId = B.AId
All good - I'm sweet with how this works.
What happens when Table C (Or some other D,E, .... gets added)
In the situation
SELECT * FROM A JOIN B ON A.AId = B.AId JOIN C ON C.BId = B.BId
What is C joining to? - is it that B table (and the values therein)? Or is it some other temporary result set that is the result of the A+B Join that the C table is joined to?
[The implication being not all values that are in the B table will necessarily be in the temporary result set A+B based on the join condition for A,B]
A specific (and fairly contrived) example of why I am asking is because I am trying to understand behaviour I am seeing in the following:
Tables Account (AccountId, AccountBalanceDate, OpeningBalanceId, ClosingBalanceId) Balance (BalanceId) BalanceToken (BalanceId, TokenAmount) Where: Account->Opening, and Closing Balances are NULLABLE (may have opening balance, closing balance, or none) Balance->BalanceToken is 1:m - a balance could consist of many tokens
Conceptually, Closing Balance of a date, would be tomorrows opening balance
If I was trying to find a list of all the opening and closing balances for an account
I might do something like
SELECT AccountId , AccountBalanceDate , Sum (openingBalanceAmounts.TokenAmount) AS OpeningBalance , Sum (closingBalanceAmounts.TokenAmount) AS ClosingBalance FROM Account A LEFT JOIN BALANCE OpeningBal ON A.OpeningBalanceId = OpeningBal.BalanceId LEFT JOIN BALANCE ClosingBal ON A.ClosingBalanceId = ClosingBal.BalanceId LEFT JOIN BalanceToken openingBalanceAmounts ON openingBalanceAmounts.BalanceId = OpeningBal.BalanceId LEFT JOIN BalanceToken closingBalanceAmounts ON closingBalanceAmounts.BalanceId = ClosingBal.BalanceId GROUP BY AccountId, AccountBalanceDate
Things work as I would expect until the last JOIN brings in the closing balance tokens - where I end up with duplicates in the result.
[I can fix with a DISTINCT - but I am trying to understand why what is happening is happening]
I have been told the problem is because the relationship between Balance, and BalanceToken is 1:M - and that when I bring in the last JOIN I am getting duplicates because the 3rd JOIN has already brought in BalanceIds multiple times into the (I assume) temporary result set.
I know that the example tables do not conform to good DB design
Apologies for the essay, thanks for any elightenment :)
Edit in response to question by Marc
Conceptually for an account there should not be duplicates in BalanceToken for An Account (per AccountingDate) - I think the problem comes about because 1 Account / AccountingDates closing balance is that Accounts opening balance for the next day - so when self joining to Balance, BalanceToken multiple times to get opening and closing balances I think Balances (BalanceId's) are being brought into the 'result mix' multiple times. If it helps to clarify the second example, think of it as a daily reconciliation - hence left joins - an opening (and/or) closing balance may not have been calculated for a given account / accountingdate combination.
In this case the two tables are joined using the relationship table1.id = table2.id . It is possible to use multiple join statements together to join more than one table at the same time. To do that you add a second INNER JOIN statement and a second ON statement to indicate the third table and the second relationship.
Inner Join with Three Tables In this example we use all three of the preceding tables; table1, Table2 and table3 and adding it using an Inner Join. The output will be displayed as a single table which satisfies the join conditions.
Conceptually here is what happens when you join three tables together.
WHERE
clause) to the first table that doesn't involve any of the other tables. It selects out the columns mentioned in the JOIN
conditions or the SELECT
list or the ORDER BY
list. Call this result AORDER BY
This is conceptually what happens. Infact there are many possible optimizations along the way. The advantage of the relational model is that the sound mathematical basis makes various transformations of plan possible while not changing the correctness.
For example, there is really no need to generate the full result sets along the way. The ORDER BY
may instead be done via accessing the data using an index in the first place. There are lots of types of joins that can be done as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With