Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why CTE is so slow comparing to Temp tables?

I have a simple stored procedure where I have multiple WITH clauses.

Some code is this:

WITH cteRowNums AS
(
    SELECT 
        ROW_NUMBER() OVER(ORDER BY fcmp.EmpUserID, fcmp.WorkCellID, fcmp.ActivityTS) AS RowNumber,
        fcmp.ActivityTS, 
        fcmp.ArtifactTypeID, 
        fcmp.ServerDateID, 
        fcmp.ServerHourID, 
        fcmp.EmpUserID, 
        fcmp.WorkCellID
        FROM dbo.FactCassetteMarkingProcessing fcmp
        WHERE ServerDateID >= '2007-01-01'
),
-- Make an attempt at identifying what each user did in their "session" by self-joining
cteJoinCurAndNext AS
(
SELECT
      [Current Row].ArtifactTypeID, 
      [Current Row].ServerDateID, 
      [Current Row].ServerHourID, 
      [Current Row].EmpUserID, 
      [Current Row].WorkCellID
FROM cteRowNums [Current Row] 
    LEFT OUTER JOIN cteRowNums [Next Row] ON [Next Row].RowNumber = [Current Row].RowNumber + 1
        WHERE [Current Row].ArtifactTypeID = 2
        OR ([Current Row].ArtifactTypeID = 1 AND [Next Row].ArtifactTypeID = 2 
                    AND [Current Row].EmpUserID = [Next Row].EmpUserID 
                    AND [Current Row].WorkCellID = [Next Row].WorkCellID)
),
-- Do some aggregations
cteAggregates AS    
(
SELECT 
    EmpUserID,
    ServerDateID,
    ServerHourID, 
    COUNT(NULLIF(ArtifactTypeID, 2)) AS SpecimensProcessedCount,  
    COUNT(NULLIF(ArtifactTypeID, 1)) AS BlocksProcessedCount 
    FROM cteJoinCurAndNext
    GROUP BY EmpUserID, ServerDateID, ServerHourID
)
SELECT * FROM cteAggregates

The problem is that this takes a lot of time for aprox 2,5 milions of rows. I canceled the execution query at 40 minutes.

If I change this piece of code with temporary table, the execution is much, much faster. Is there any method to obtain almost the same performance using just CTEs ?

like image 969
Mihai Alexandru-Ionut Avatar asked Jan 24 '23 20:01

Mihai Alexandru-Ionut


1 Answers

There are two reasons.

Probably the more important reason is that SQL Server does not materialize CTEs. So, for every reference, SQL Server recalculates the entire CTE. As far as I know, SQL Server also does not do common subquery optimizations on the execution DAG, so it always regenerates the CTES (although the execution plans might be different for each instance).

The second reason is that temporary tables have statistics, and these statistics can inform the query plan to create a better plan.

I suspect that you can simplify the logic. However, you would need to ask a new question with an explanation of what you want to do, along with sample data and desired results.

like image 144
Gordon Linoff Avatar answered Feb 04 '23 19:02

Gordon Linoff