I'm having a performance problem.
I created a table that receives data from a file, I do a BULK INSERT
. Then I do a SELECT
with multiple INNER JOIN
s (11 inner joins) to insert into another table with the right data.
When I run this SELECT
, it takes too long (more than a hour) and then I stop it. My solution was to break this query into 3, creating @temp
tables. To my surprise, that takes 3 minutes. That's what I'm trying to understand, WHY breaking my query into 3 was FASTER than one select statement. Here is my query:
SELECT t1.ReturnINT, t1.ReturnBIT, t2.ReturnINT, t3.ReturnINT, t5.ReturnINT, t1.ReturnDateTime
FROM t1
INNER JOIN t2
ON t2.my_column_varchar = t1.my_column_varchar
INNER JOIN t3
ON t3.my_column_number = t1.my_column_number AND t2.my_column_ID = t3.my_column_ID
INNER JOIN t4
ON t4.my_column_varchar = t1.my_column_varchar
INNER JOIN t5
ON t5.my_column_int = t1.my_column_int AND t5.my_column_int = t4.my_column_int AND t2.my_column_int = t5.my_column_int
INNER JOIN t6
ON t6.my_column_int = t5.my_column_int AND t6.my_column_int = t2.my_column_int
INNER JOIN t7
ON t7.my_column_int = t6.my_column_int
INNER JOIN t8
ON t8.my_column_int = t3.my_column_int AND t8.my_column_datetime = t1.my_column_datetime
INNER JOIN t9
ON t9.my_column_int = t3.my_column_int AND t8.my_column_datetime BETWEEN t9.my_column_datetime1 AND t9.datetime1 + t9.my_column_datetime2
INNER JOIN t10
ON t10.my_column_int = t9.my_column_int AND t10.my_column_int = t6.my_column_int
INNER JOIN t11
ON t11.my_column_int = t9.my_column_int AND t8.my_column_datetime = t11.my_column_datetime
----EDITED----
There is NO where clause, my query is exactly as I put here.
Here is my broken querys, i forget to put them here. It runs in 3 minutes.
DECLARE @temp TABLE (
<Some_columns>
)
INSERT INTO @temp
SELECT <My_Linked_Columns>
FROM t1
INNER JOIN t2
ON t2.my_column_varchar = t1.my_column_varchar
INNER JOIN t3
ON t3.my_column_number = t1.my_column_number AND t2.my_column_ID = t3.my_column_ID
INNER JOIN t4
ON t4.my_column_varchar = t1.my_column_varchar
INNER JOIN t5
ON t5.my_column_int = t1.my_column_int AND t5.my_column_int = t4.my_column_int AND t2.my_column_int = t5.my_column_int
DECLARE @temp2 TABLE(
<Some_Columns>
)
INSERT INTO @temp2
SELECT <More_Linked_Columns>
FROM @temp as temp
INNER JOIN t6
ON t6.my_column_int = temp.my_column_int AND t6.my_column_int = temp.my_column_int
INNER JOIN t7
ON t7.my_column_int = t6.my_column_int
INNER JOIN t8
ON t8.my_column_int = temp.my_column_int AND t8.my_column_datetime = temp.my_column_datetime
DECLARE @temp3 TABLE(
<Some_Columns>
)
INSERT INTO @temp3
SELECT <More_Linked_Columns>
FROM @temp2 AS temp2
INNER JOIN t9
ON t9.my_column_int = temp2.my_column_int AND temp2.my_column_datetime BETWEEN t9.my_column_datetime1 AND t9.datetime1 + t9.my_column_datetime2
INNER JOIN t10
ON t10.my_column_int = t9.my_column_int AND t10.my_column_int = temp2.my_column_int
INNER JOIN t11
ON t11.my_column_int = t9.my_column_int AND temp2.my_column_datetime = t11.my_column_datetime
SELECT <All_Final_Columns>
FROM @temp3
----EDITED 3----
Studying more things I discovered a problem in execution plan. I have a Nested Loop that estimates 1 row but it actually returns 1.204.014 rows. I guess the problem is exactly here, but I didn't find out how to solve this problem without breaking my query in 3 parts (Now I know why breaking it is faster hehehe)
How do I prevent duplicate rows from joining multiple tables? Solution. Select column values in a specific order within rows to make rows with duplicate sets of values identical. Then you can use SELECT DISTINCT to remove duplicates.
I won't leave you in suspense, between Joins and Subqueries, joins tend to execute faster. In fact, query retrieval time using joins will almost always outperform one that employs a subquery. The reason is that joins mitigate the processing burden on the database by replacing multiple queries with one join query.
Most common reasons:
Reason 1: When two tables having n and m rows participating in INNER JOIN
have many to many relationship, then the INNER JOIN
can near a CROSS JOIN
and can produce result set with more than MAX(n,m) rows, theoretically n x m rows are possible.
Now imagine many such tables in INNER JOIN
.
This will result in the result set becoming bigger and bigger and will start eating into the allocated memory area.
This could be a reason why temp tables might help you.
Reason 2: You do not have INDEX
built on the columns you are joining tables on.
Reason 3: Do you have functions in WHERE
clause?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With