I have the following SQL statement in a legacy system I'm refactoring. It is an abbreviated view for the purposes of this question, just returning count(*) for the time being.
SELECT COUNT(*) FROM Table1 INNER JOIN Table2 INNER JOIN Table3 ON Table2.Key = Table3.Key AND Table2.Key2 = Table3.Key2 ON Table1.DifferentKey = Table3.DifferentKey
It is generating a very large number of records and killing the system, but could someone please explain the syntax? And can this be expressed in any other way?
EDIT:
Suggested reformat
SELECT COUNT(*) FROM Table1 INNER JOIN Table3 ON Table1.DifferentKey = Table3.DifferentKey INNER JOIN Table2 ON Table2.Key = Table3.Key AND Table2.Key2 = Table3.Key2
The T-SQL language allows us to join multiple tables together in a single query. After the first table, each additional one requires its own join statement and its own join condition.
A Nested Loops join works in the same way. One of the joining tables is designated as the outer table and another one as the inner table. For each row of the outer table, all the rows from the inner table are matched one by one if the row matches it is included in the result-set otherwise it is ignored.
An SQL INNER JOIN is same as JOIN clause, combining rows from two or more tables. An inner join of A and B gives the result of A intersect B, i.e. the inner part of a Venn diagram intersection. Inner joins use a comparison operator to match rows from two tables based on the values in common columns from each table.
The syntax for multiple joins: SELECT column_name1,column_name2,.. FROM table_name1 INNER JOIN table_name2 ON condition_1 INNER JOIN table_name3 ON condition_2 INNER JOIN table_name4 ON condition_3 . . . Note: While selecting only particular columns use table_name.
For readability, I restructured the query... starting with the apparent top-most level being Table1, which then ties to Table3, and then table3 ties to table2. Much easier to follow if you follow the chain of relationships.
Now, to answer your question. You are getting a large count as the result of a Cartesian product. For each record in Table1 that matches in Table3 you will have X * Y. Then, for each match between table3 and Table2 will have the same impact... Y * Z... So your result for just one possible ID in table 1 can have X * Y * Z records.
This is based on not knowing how the normalization or content is for your tables... if the key is a PRIMARY key or not..
Ex: Table 1 DiffKey Other Val 1 X 1 Y 1 Z Table 3 DiffKey Key Key2 Tbl3 Other 1 2 6 V 1 2 6 X 1 2 6 Y 1 2 6 Z Table 2 Key Key2 Other Val 2 6 a 2 6 b 2 6 c 2 6 d 2 6 e
So, Table 1 joining to Table 3 will result (in this scenario) with 12 records (each in 1 joined with each in 3). Then, all that again times each matched record in table 2 (5 records)... total of 60 ( 3 tbl1 * 4 tbl3 * 5 tbl2 )count would be returned.
So, now, take that and expand based on your 1000's of records and you see how a messed-up structure could choke a cow (so-to-speak) and kill performance.
SELECT COUNT(*) FROM Table1 INNER JOIN Table3 ON Table1.DifferentKey = Table3.DifferentKey INNER JOIN Table2 ON Table3.Key =Table2.Key AND Table3.Key2 = Table2.Key2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With