I need to select rows from BUNDLES table which have one of several SAP_STATE_ID values. Those values depends on whether respective SAP status is supposed to be exported or not.
This query runs really fast (there is index on SAP_STATE_ID field) -
SELECT b.* FROM BUNDLES b WHERE b.SAP_STATE_ID IN (2,3,5,6)
But... I'd like to fetch list of IDs dynamically, like this:
SELECT b.* FROM BUNDLES b
WHERE b.SAP_STATE_ID IN
(SELECT s.SAP_STATE_ID FROM SAP_STATES s WHERE s.EXPORT_TO_SAP = 1)
And ouch, this query is suddenly taking too much time. I would expect SQL server to run the subquery first (it doesn't depend on anything from main query) and then run whole thing just like in my first example. I tried to rewrite it to use joins instead of subquery:
SELECT b.* FROM BUNDLES b
JOIN SAP_STATES s ON (s.SAP_STATE_ID = b.SAP_STATE_ID)
WHERE s.EXPORT_TO_SAP = 1
but it has same poor performance. It seems like it is running the subquery for each row of BUNDLES table or something like this. I am not very skilled in reading execution plans, but I tried. It says that 81% cost is for scanning Primary key index of BUNDLES (I have no idea why it should do such a thing, there is BUNDLE_ID field defined as PRIMARY KEY, but it doesn't appear in the query at all...)
Does anyone have an explanation why is SQL server so "stupid"? Is there a way to achieve what I want with good performance but without the need to provide static list of SAP_STATE_IDs?
script for both tables and relevant indexes - http://mab.to/xbYiI0wKj
execution plan for subquery version - http://mab.to/8Qh6gpdYZ
query plan for version with joins - http://mab.to/YCqeGCUbr
(for some reason these two plans looks the same and both suggest creating BUNDLES.SAP_STATE_ID index, which is already there)
Lack of indexes If there aren't any indexes on the pertinent columns, the RDBMS may have to resort to full table scans which could be slow.
I won't leave you in suspense, between Joins and Subqueries, joins tend to execute faster. In fact, query retrieval time using joins will almost always outperform one that employs a subquery. The reason is that joins mitigate the processing burden on the database by replacing multiple queries with one join query.
CTE can be more readable: Another advantage of CTE is CTE are more readable than Subqueries. Since CTE can be reusable, you can write less code using CTE than using subquery. Also, people tend to follow the logic and ideas easier in sequence than in a nested fashion.
I am pretty sure your statistics are off on the tables. If you want to get it working in a hurry I would write the query as:
SELECT b.*
FROM SAP_STATES s
INNER LOOP JOIN BUNDLES b
ON s.SAP_STATE_ID = b.SAP_STATE_ID
WHERE s.EXPORT_TO_SAP = 1
This forces a nested loops join over SAP_STATES
which filters on BUNDLES
When you use tables(temporary or physical), the SQL engine builds statistics against it and thus has a very clear idea on the number of rows in it and which is the best execution approach for it. On the other hand, a computed table(sub query) doesn't have statistics against it.
So while it might be seemingly simple for a human to deduce the number of rows in it, the "stupid" SQL Engine is unaware of all this. Now, coming to the query, the WHERE s.EXPORT_TO_SAP = 1
clause is making a world of difference here. The clustered index is sorted and built on the SAP_STATE_ID, but to additionally check the WHERE clause, it has no option but to scan the entire table(in the final dataset)! I bet that if instead of a clustered index, if there was a non clustered covered index on SAP_STATE_ID column which covered the EXPORT_TO_SAP field, it might have done the trick. Since clustered index scans are generally bad for performance, I would suggest you to take the below approach:
SELECT s.SAP_STATE_ID
into #Sap_State
FROM SAP_STATES s WHERE s.EXPORT_TO_SAP = 1
SELECT b.* FROM BUNDLES b
join #Sap_State a on a.sap_state_id = b.sap_state_id
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With