I'm reading "Dissecting SQL Server Execution Plans" from Grant Fritchey and it's helping me a lot to see why certain queries are slow.
However, I am stumped with this case where a simple rewrite performs quite a lot faster.
This is my first attempt and it takes 21 secs. It uses a derived table:
-- 21 secs
SELECT *
FROM Table1 AS o JOIN(
SELECT col1
FROM Table1
GROUP BY col1
HAVING COUNT( * ) > 1
) AS i ON ON i.col1= o.col1
My second attempt is 3 times faster and simply moves out the derived table to a temp table. Now it's 3 times faster:
-- 7 secs
SELECT col1
INTO #doubles
FROM Table1
GROUP BY col1
HAVING COUNT( * ) > 1
SELECT *
FROM Table1 AS o JOIN #doubles AS i ON i.col1= o.col1
My main interest is into why moving from a derived table to a temp table improves performance so much, not on how to make it even faster.
I would be grateful if someone could show me how I can diagnose this issue using the (graphical) execution plan.
Xml Execution plan: https://www.sugarsync.com/pf/D6486369_1701716_16980
Edit 1
When I created statistics on the 2 columns that were specified in the group by and the optimizer started doing "the right thing", after giving up the procedure cache (don't forget that if you are a beginner!). I simplified the query in the question which was not a good simplification in retrospect. The attached sqlplan shows the 2 columns but this was not obvious.
The estimates are now a lot more accurate as is the performance which is up to par with the temp table solution. As you know the optimizer creates stats on single columns automatically (if not disabled) but 2 column statistics have to be create by the DBA.
A (non clustered) index on these 2 columns made the query perform the same but in this case a stat is just as good and it doesn't suffer the downside of index maintenance. I'm going forward with the 2 column stat and see how it performs. @Grant Do you know if the stats on an index are more reliable than that of a column stat?
Edit 2
I always follow up once a problem is solved on how a similar problem can be diagnosed faster in the future.
The problem here was that the estimated row couns were way of. The graphical execution plans shows these when you hover over a row but that's about it.
Some tools that can help:
I heard this one will become obsolete and be replaced by its XML variant but I still like the output which is in grid format. Here the big diff between columns "Rows" and "EstimateRows" would have shown the problem
This is a nice tool especially if you are a beginner. It highlights problems
A more general purpose tool but again directs the user to potential problems
Kind Regards, Tom
Since no tables are created on the database side, all the processing is performed on the database server's memory. Therefore, using a derived table does not affect performance on the MicroStrategy side, however, it could potentially hurt performance on the database side depending on the complexity of the query.
Advantages of Temporary Tables You can create a temporary table and insert, delete and update its records without worrying about whether you have sufficient rights to change data in permanent tables, or whether you might be accidentally doing so.
The reason, temp tables are faster in loading data as they are created in the tempdb and the logging works very differently for temp tables. All the data modifications are not logged in the log file the way they are logged in the regular table, hence the operation with the Temp tables are faster.
Looking at the values for the first execution plan, it looks like it's statistics. You have an estimated number of rows at 800 and an actual of 1.2 million. I think you'll find that updating the statistics will change the way the first query's plan is generated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With