Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is performance increased when moving from a derived table to a temp table solution?

I'm reading "Dissecting SQL Server Execution Plans" from Grant Fritchey and it's helping me a lot to see why certain queries are slow.

However, I am stumped with this case where a simple rewrite performs quite a lot faster.

This is my first attempt and it takes 21 secs. It uses a derived table:

-- 21 secs
SELECT *
  FROM Table1 AS o JOIN( 
    SELECT col1
    FROM    Table1
    GROUP BY    col1
    HAVING  COUNT( * ) > 1
) AS i ON ON i.col1= o.col1

My second attempt is 3 times faster and simply moves out the derived table to a temp table. Now it's 3 times faster:

-- 7 secs
SELECT col1
INTO    #doubles
FROM    Table1
GROUP BY    col1
HAVING  COUNT( * ) > 1

SELECT *
FROM Table1 AS o JOIN #doubles AS i ON i.col1= o.col1

My main interest is into why moving from a derived table to a temp table improves performance so much, not on how to make it even faster.

I would be grateful if someone could show me how I can diagnose this issue using the (graphical) execution plan.

Xml Execution plan: https://www.sugarsync.com/pf/D6486369_1701716_16980

Edit 1

When I created statistics on the 2 columns that were specified in the group by and the optimizer started doing "the right thing", after giving up the procedure cache (don't forget that if you are a beginner!). I simplified the query in the question which was not a good simplification in retrospect. The attached sqlplan shows the 2 columns but this was not obvious.

The estimates are now a lot more accurate as is the performance which is up to par with the temp table solution. As you know the optimizer creates stats on single columns automatically (if not disabled) but 2 column statistics have to be create by the DBA.

A (non clustered) index on these 2 columns made the query perform the same but in this case a stat is just as good and it doesn't suffer the downside of index maintenance. I'm going forward with the 2 column stat and see how it performs. @Grant Do you know if the stats on an index are more reliable than that of a column stat?

Edit 2

I always follow up once a problem is solved on how a similar problem can be diagnosed faster in the future.

The problem here was that the estimated row couns were way of. The graphical execution plans shows these when you hover over a row but that's about it.

Some tools that can help:

  1. SET STATISTICS PROFILE ON

I heard this one will become obsolete and be replaced by its XML variant but I still like the output which is in grid format. Here the big diff between columns "Rows" and "EstimateRows" would have shown the problem

  1. External Tool: SQL Sentry Plan Explorer http://www.sqlsentry.net/

This is a nice tool especially if you are a beginner. It highlights problems

enter image description here

  1. External Tool: SSMS Tools Pack http://www.ssmstoolspack.com/

A more general purpose tool but again directs the user to potential problems

enter image description here

Kind Regards, Tom

like image 381
buckley Avatar asked Feb 28 '12 14:02

buckley


People also ask

How do derived tables affect performance?

Since no tables are created on the database side, all the processing is performed on the database server's memory. Therefore, using a derived table does not affect performance on the MicroStrategy side, however, it could potentially hurt performance on the database side depending on the complexity of the query.

What is the advantage of using a temporary table instead of a table?

Advantages of Temporary Tables You can create a temporary table and insert, delete and update its records without worrying about whether you have sufficient rights to change data in permanent tables, or whether you might be accidentally doing so.

Is temp table faster than table?

The reason, temp tables are faster in loading data as they are created in the tempdb and the logging works very differently for temp tables. All the data modifications are not logged in the log file the way they are logged in the regular table, hence the operation with the Temp tables are faster.


1 Answers

Looking at the values for the first execution plan, it looks like it's statistics. You have an estimated number of rows at 800 and an actual of 1.2 million. I think you'll find that updating the statistics will change the way the first query's plan is generated.

like image 157
Grant Fritchey Avatar answered Sep 29 '22 00:09

Grant Fritchey