Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Sql Server - OUTER APPLY versus Sub-queries [closed]

Please consider the following 2 statements in Sql Server:

This one is using Nested sub-queries:

    WITH cte AS
    FROM Segments
    ORDER BY InvoiceDetailID, SegmentID
SELECT *, ReturnDate =
                (SELECT TOP 1 cte.DepartureInfo
                    FROM cte
                    WHERE seg.InvoiceDetailID = cte.InvoiceDetailID
                        AND cte.SegmentID > seg.SegmentID), 
            DepartureCityCode =
                (SELECT TOP 1 cte.DepartureCityCode
                    FROM cte
                    WHERE seg.InvoiceDetailID = cte.InvoiceDetailID
                        AND cte.SegmentID > seg.SegmentID)
FROM Segments seg

And this uses an OUTER APPLY operator:

    WITH cte AS
    FROM Segments
    ORDER BY InvoiceDetailID, SegmentID
SELECT seg.*, t.DepartureInfo AS ReturnDate, t.DepartureCityCode
FROM Segments seg OUTER APPLY (
                SELECT TOP 1 cte.DepartureInfo, cte.DepartureCityCode
                FROM cte
                WHERE seg.InvoiceDetailID = cte.InvoiceDetailID
                        AND cte.SegmentID > seg.SegmentID
            ) t

Which of these 2 would potentially perform better considering that both Segments table can potentially have millions of rows?

My intuition is OUTER APPLY would perform better.

A couple of more questions:

  1. Almost I am quite sure about this, but still wanted to confirm that in the first solution, the CTE would effectively be executed twice (because its referenced twice and CTE is expanded inline like a Macro).
  2. Would CTE be executed once for each row when used in the OUTER APPLY operator? Also would it be executed for each row when used in nested query in first statement??
like image 874
r_honey Avatar asked Mar 25 '11 15:03


2 Answers

First, get rid of the Top 100 Percent in the CTE. You are not using TOP here and if you wanted the results sorted, you should add an Order By to the end of the entire statement. Second, to address your question about performance, and if forced to make a guess, my bet would be on the second form only because it has a single subquery instead of two. Third, another form which you might try would be:

With RankedSegments As
    Select S1.SegmentId, ...
        , Row_Number() Over( Partition By S1.SegmentId Order By S2.SegmentId ) As Num
    From Segments As S1
        Left Join Segments As S2
            On S2.InvoiceDetailId = S1.InvoiceDetailId
                And S2.SegmentId > S1.SegmentID
Select ...
From RankedSegments
Where Num = 1

Another possibility

With MinSegments As
    Select S1.SegmentId, Min(S2.SegmentId) As MinSegmentId
    From Segments As S1
        Join Segments As S2
            On S2.InvoiceDetailId = S1.InvoiceDetailId
                And S2.SegmentId > S1.SegmentID
    Group By S1.SegmentId
Select ...
From Segments As S1
    Left Join (MinSegments As MS1
        Join Segments As S2
            On S2.SegmentId = MS1.MinSegmentId)
        On MS1.SegmentId = S1.SegmentId
like image 59
Thomas Avatar answered Nov 15 '22 04:11


Maybe I will use this variation of Thomas' query:

SELECT *, Row_Number() Over( Partition By SegmentId Order By InvoiceDetailID, SegmentId ) As Num
FROM Segments)
SELECT seg.*, t.DepartureInfo AS ReturnDate, t.DepartureCityCode
FROM Segments seg LEFT JOIN cte t ON seg.InvoiceDetailID = t.InvoiceDetailID AND t.SegmentID > seg.SegmentID AND t.Num = 1
like image 31
r_honey Avatar answered Nov 15 '22 03:11
