This is a question about SQL Server 2008 R2
I'm not a DBA, by far. I'm a java developer, who has to write SQL from time to time. (mostly embedded in code). I want to know if I did something wrong here, and if so, what I can do to avoid it to happen again.
Q1:
SELECT something FROM (SELECT * FROM T1 WHERE condition1) JOIN ... Q1 features 14 joins
Q2 is the same as Q1, with one exception. (SELECT * FROM T1 WHERE condition1) is executed before, and stored in a temp table.
This is not a correlated sub-query.
Q2:
SELECT * INTO #tempTable FROM T1 WHERE condition1 SELECT something FROM #tempTable  JOIN ... again, 14 joins.
The thing that puzzles me now is that Q1 took > 2min, (tried it a few times, to avoid caching to play a role) while Q2 (both queries combined) took 2sec!!! What gives?
This biggest difference is that a CTE can only be used in the current query scope whereas a temporary table or table variable can exist for the entire duration of the session allowing you to perform many different DML operations against them.
It is a temporary result set and typically it may be a result of complex sub-query. Unlike the temporary table, its life is limited to the current query. It is defined by using WITH statement. CTE improves readability and ease in maintenance of complex queries and sub-queries.
However, if there is a memory pressure the pages belonging to a table variable may be pushed to tempdb. ⇒ Table variables cannot be involved in transactions, logging or locking. This makes @table faster then #temp. So table variable is faster then temporary table.
It says that temp tables are always on disk, and table variables are in memory, that is to say, the performance of table variable is better than temp table because table variable uses less IO operations than temp table.
Why it's not recommended to use subqueries?
Database Optimizer (regardless of what database you are using) can not always properly optimize such query (with subqueries). In this case, the problem to the optimizer is to choose the right way to join result sets. There are several algorithms for joining two result sets. The choice of the algorithm depends on the number of records which are contained in one and in the other result set. In case if you join two physical tables (subquery is not a physical table), the database can easily determine the amount of data in two result sets by the available statistics. If one of result sets is a subquery then to understand how many records it returns is very difficult. In this case the database can choose wrong query plan of join, so that will lead to a dramatic reduction in the performance of the query.
Rewriting the query with using temporary tables is intended to simplify the database optimizer. In the rewritten query all result sets participating in joins will be physical tables and the database will easily determine the length of each result set. This will allow the database to choose the guaranteed fastest of all possible query plans. Moreover, the database will make the right choice no matter what are the conditions. The rewritten query with temporary tables would work well on any database, this is especially important in the development of portable solutions. In addition, the rewritten query is easier to read, easier to understand and to debug.
It is understood that rewriting the query with temporary tables can lead to some slowdown due to additional expenses: creation of temporary tables. If the database will not be mistaken with the choice of the query plan, it will perform the old query faster than a new one. However, this slowdown will always be negligible. Typically the creation of a temporary table takes a few milliseconds. That is, the delay can not have a significant impact on system performance, and usually can be ignored.
Important! Do not forget to create indexes for temporary tables. The index fields should include all fields that are used in join conditions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With