Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is subquery and join so slow

I need to select rows from BUNDLES table which have one of several SAP_STATE_ID values. Those values depends on whether respective SAP status is supposed to be exported or not.

This query runs really fast (there is index on SAP_STATE_ID field) -

SELECT b.* FROM BUNDLES b WHERE b.SAP_STATE_ID IN (2,3,5,6)

But... I'd like to fetch list of IDs dynamically, like this:

SELECT b.* FROM BUNDLES b 
WHERE b.SAP_STATE_ID IN 
(SELECT s.SAP_STATE_ID FROM SAP_STATES s WHERE s.EXPORT_TO_SAP = 1)

And ouch, this query is suddenly taking too much time. I would expect SQL server to run the subquery first (it doesn't depend on anything from main query) and then run whole thing just like in my first example. I tried to rewrite it to use joins instead of subquery:

SELECT b.* FROM BUNDLES b 
JOIN SAP_STATES s ON (s.SAP_STATE_ID = b.SAP_STATE_ID) 
WHERE s.EXPORT_TO_SAP = 1

but it has same poor performance. It seems like it is running the subquery for each row of BUNDLES table or something like this. I am not very skilled in reading execution plans, but I tried. It says that 81% cost is for scanning Primary key index of BUNDLES (I have no idea why it should do such a thing, there is BUNDLE_ID field defined as PRIMARY KEY, but it doesn't appear in the query at all...)

Does anyone have an explanation why is SQL server so "stupid"? Is there a way to achieve what I want with good performance but without the need to provide static list of SAP_STATE_IDs?

script for both tables and relevant indexes - http://mab.to/xbYiI0wKj

execution plan for subquery version - http://mab.to/8Qh6gpdYZ

query plan for version with joins - http://mab.to/YCqeGCUbr

(for some reason these two plans looks the same and both suggest creating BUNDLES.SAP_STATE_ID index, which is already there)

like image 602
lot Avatar asked Sep 30 '14 15:09

lot


People also ask

Why might a subquery join Slow?

Lack of indexes If there aren't any indexes on the pertinent columns, the RDBMS may have to resort to full table scans which could be slow.

Which is faster subqueries or joins?

I won't leave you in suspense, between Joins and Subqueries, joins tend to execute faster. In fact, query retrieval time using joins will almost always outperform one that employs a subquery. The reason is that joins mitigate the processing burden on the database by replacing multiple queries with one join query.

Is it better to use CTE or subquery?

CTE can be more readable: Another advantage of CTE is CTE are more readable than Subqueries. Since CTE can be reusable, you can write less code using CTE than using subquery. Also, people tend to follow the logic and ideas easier in sequence than in a nested fashion.


2 Answers

I am pretty sure your statistics are off on the tables. If you want to get it working in a hurry I would write the query as:

SELECT b.*
  FROM SAP_STATES s 
 INNER LOOP JOIN BUNDLES b 
    ON s.SAP_STATE_ID = b.SAP_STATE_ID
 WHERE s.EXPORT_TO_SAP = 1

This forces a nested loops join over SAP_STATES which filters on BUNDLES

like image 153
Filip De Vos Avatar answered Oct 23 '22 13:10

Filip De Vos


When you use tables(temporary or physical), the SQL engine builds statistics against it and thus has a very clear idea on the number of rows in it and which is the best execution approach for it. On the other hand, a computed table(sub query) doesn't have statistics against it.

So while it might be seemingly simple for a human to deduce the number of rows in it, the "stupid" SQL Engine is unaware of all this. Now, coming to the query, the WHERE s.EXPORT_TO_SAP = 1 clause is making a world of difference here. The clustered index is sorted and built on the SAP_STATE_ID, but to additionally check the WHERE clause, it has no option but to scan the entire table(in the final dataset)! I bet that if instead of a clustered index, if there was a non clustered covered index on SAP_STATE_ID column which covered the EXPORT_TO_SAP field, it might have done the trick. Since clustered index scans are generally bad for performance, I would suggest you to take the below approach:

SELECT s.SAP_STATE_ID 
into #Sap_State
FROM SAP_STATES s WHERE s.EXPORT_TO_SAP = 1

SELECT b.* FROM BUNDLES b 
join #Sap_State a on a.sap_state_id = b.sap_state_id
like image 42
SouravA Avatar answered Oct 23 '22 12:10

SouravA