Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server CTE referred in self joins slow

I have written a table-valued UDF that starts by a CTE to return a subset of the rows from a large table. There are several joins in the CTE. A couple of inner and one left join to other tables, which don't contain a lot of rows. The CTE has a where clause that returns the rows within a date range, in order to return only the rows needed.

I'm then referencing this CTE in 4 self left joins, in order to build subtotals using different criterias.

The query is quite complex but here is a simplified pseudo-version of it

WITH DataCTE as
(
     SELECT [columns] FROM table
                      INNER JOIN table2
                      ON [...]

                      INNER JOIN table3
                      ON [...]

                      LEFT JOIN table3
                      ON [...]
)
SELECT [aggregates_columns of each subset] FROM DataCTE Main
LEFT JOIN DataCTE BananasSubset
               ON [...] 
             AND Product = 'Bananas'
             AND Quality = 100
LEFT JOIN DataCTE DamagedBananasSubset
               ON [...]
             AND Product = 'Bananas'
             AND Quality < 20
LEFT JOIN DataCTE MangosSubset
               ON [...]
GROUP BY [

I have the feeling that SQL Server gets confused and calls the CTE for each self join, which seems confirmed by looking at the execution plan, although I confess not being an expert at reading those.

I would have assumed SQL Server to be smart enough to only perform the data retrieval from the CTE only once, rather than do it several times.

I have tried the same approach but rather than using a CTE to get the subset of the data, I used the same select query as in the CTE, but made it output to a temp table instead.

The version referring the CTE version takes 40 seconds. The version referring the temp table takes between 1 and 2 seconds.

Why isn't SQL Server smart enough to keep the CTE results in memory?

I like CTEs, especially in this case as my UDF is a table-valued one, so it allowed me to keep everything in a single statement.

To use a temp table, I would need to write a multi-statement table valued UDF, which I find a slightly less elegant solution.

Did some of you had this kind of performance issues with CTE, and if so, how did you get them sorted?

Thanks,

Kharlos

like image 340
Kharlos Dominguez Avatar asked Jun 16 '10 15:06

Kharlos Dominguez


1 Answers

I believe that CTE results are retrieved every time. With a temp table the results are stored until it is dropped. This would seem to explain the performance gains you saw when you switched to a temp table.

Another benefit is that you can create indexes on a temporary table which you can't do to a cte. Not sure if there would be a benefit in your situation but it's good to know.

Related reading:

  • Which are more performant, CTE or temporary tables?
  • SQL 2005 CTE vs TEMP table Performance when used in joins of other tables
  • http://msdn.microsoft.com/en-us/magazine/cc163346.aspx#S3

Quote from the last link:

The CTE's underlying query will be called each time it is referenced in the immediately following query.

I'd say go with the temp table. Unfortunately elegant isn't always the best solution.

UPDATE:

Hmmm that makes things more difficult. It's hard for me to say with out looking at your whole environment.

Some thoughts:

  • can you use a stored procedure instead of a UDF (instead, not from within)?
  • This may not be possible but if you can remove the left join from you CTE you could move that into an indexed view. If you are able to do this you may see performance gains over even the temp table.
like image 194
Abe Miessler Avatar answered Nov 15 '22 22:11

Abe Miessler