I'm having a performance problem.
I created a table that receives data from a file, I do a BULK INSERT. Then I do a SELECT with multiple INNER JOINs (11 inner joins) to insert into another table with the right data.
When I run this SELECT, it takes too long (more than a hour) and then I stop it. My solution was to break this query into 3, creating @temp tables. To my surprise, that takes 3 minutes. That's what I'm trying to understand, WHY breaking my query into 3 was FASTER than one select statement. Here is my query:
SELECT t1.ReturnINT, t1.ReturnBIT, t2.ReturnINT, t3.ReturnINT, t5.ReturnINT, t1.ReturnDateTime
FROM t1
INNER JOIN t2
ON t2.my_column_varchar = t1.my_column_varchar
INNER JOIN t3
ON t3.my_column_number = t1.my_column_number AND t2.my_column_ID = t3.my_column_ID
INNER JOIN t4
ON t4.my_column_varchar = t1.my_column_varchar
INNER JOIN t5
ON t5.my_column_int = t1.my_column_int AND t5.my_column_int = t4.my_column_int AND t2.my_column_int = t5.my_column_int
INNER JOIN t6
ON t6.my_column_int = t5.my_column_int AND t6.my_column_int = t2.my_column_int
INNER JOIN t7
ON t7.my_column_int = t6.my_column_int
INNER JOIN t8
ON t8.my_column_int = t3.my_column_int AND t8.my_column_datetime = t1.my_column_datetime
INNER JOIN t9
ON t9.my_column_int = t3.my_column_int AND t8.my_column_datetime BETWEEN t9.my_column_datetime1 AND t9.datetime1 + t9.my_column_datetime2
INNER JOIN t10
ON t10.my_column_int = t9.my_column_int AND t10.my_column_int = t6.my_column_int
INNER JOIN t11
ON t11.my_column_int = t9.my_column_int AND t8.my_column_datetime = t11.my_column_datetime
----EDITED----
There is NO where clause, my query is exactly as I put here.
Here is my broken querys, i forget to put them here. It runs in 3 minutes.
DECLARE @temp TABLE (
<Some_columns>
)
INSERT INTO @temp
SELECT <My_Linked_Columns>
FROM t1
INNER JOIN t2
ON t2.my_column_varchar = t1.my_column_varchar
INNER JOIN t3
ON t3.my_column_number = t1.my_column_number AND t2.my_column_ID = t3.my_column_ID
INNER JOIN t4
ON t4.my_column_varchar = t1.my_column_varchar
INNER JOIN t5
ON t5.my_column_int = t1.my_column_int AND t5.my_column_int = t4.my_column_int AND t2.my_column_int = t5.my_column_int
DECLARE @temp2 TABLE(
<Some_Columns>
)
INSERT INTO @temp2
SELECT <More_Linked_Columns>
FROM @temp as temp
INNER JOIN t6
ON t6.my_column_int = temp.my_column_int AND t6.my_column_int = temp.my_column_int
INNER JOIN t7
ON t7.my_column_int = t6.my_column_int
INNER JOIN t8
ON t8.my_column_int = temp.my_column_int AND t8.my_column_datetime = temp.my_column_datetime
DECLARE @temp3 TABLE(
<Some_Columns>
)
INSERT INTO @temp3
SELECT <More_Linked_Columns>
FROM @temp2 AS temp2
INNER JOIN t9
ON t9.my_column_int = temp2.my_column_int AND temp2.my_column_datetime BETWEEN t9.my_column_datetime1 AND t9.datetime1 + t9.my_column_datetime2
INNER JOIN t10
ON t10.my_column_int = t9.my_column_int AND t10.my_column_int = temp2.my_column_int
INNER JOIN t11
ON t11.my_column_int = t9.my_column_int AND temp2.my_column_datetime = t11.my_column_datetime
SELECT <All_Final_Columns>
FROM @temp3
----EDITED 3----
Studying more things I discovered a problem in execution plan. I have a Nested Loop that estimates 1 row but it actually returns 1.204.014 rows. I guess the problem is exactly here, but I didn't find out how to solve this problem without breaking my query in 3 parts (Now I know why breaking it is faster hehehe)
How do I prevent duplicate rows from joining multiple tables? Solution. Select column values in a specific order within rows to make rows with duplicate sets of values identical. Then you can use SELECT DISTINCT to remove duplicates.
I won't leave you in suspense, between Joins and Subqueries, joins tend to execute faster. In fact, query retrieval time using joins will almost always outperform one that employs a subquery. The reason is that joins mitigate the processing burden on the database by replacing multiple queries with one join query.
Most common reasons:
Reason 1: When two tables having n and m rows participating in INNER JOIN have many to many relationship, then the INNER JOIN can near a CROSS JOIN and can produce result set with more than MAX(n,m) rows, theoretically n x m rows are possible.
Now imagine many such tables in INNER JOIN.
This will result in the result set becoming bigger and bigger and will start eating into the allocated memory area.
This could be a reason why temp tables might help you.
Reason 2: You do not have INDEX built on the columns you are joining tables on.
Reason 3: Do you have functions in WHERE clause?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With