Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Performance using ROW_NUMBER and dynamic order by

Tags:

sql-server

I realise this is a common issue and has been discussed on SO previously, but I thought I would raise the question again in the hope that some viable alternative can be found.

Take the following SQL which combines paging with dynamic ordering:

WITH CTE AS (
  SELECT 
      OrderID,
      ROW_NUMBER() OVER (ORDER BY
        CASE WHEN @SortCol='OrderID' THEN OrderID END ASC,
        CASE WHEN @SortCol='CustomerName' THEN Surname END ASC 
      ) AS ROW_ID
  FROM Orders WHERE X
)

SELECT Orders.* FROM CTE
INNER JOIN Orders ON CTE.OrderID = Orders.OrderID
WHERE ROW_ID BETWEEN @RowStart AND @RowStart + @RowCount -1;

As is well known, the ROW_NUMBER() method does not work well on large tables since the Indexes on the table cannot be properly used when using multiple CASE statements in the ORDER BY clause (see link).

The solution we've been using for a number of years is to construct a string which is then executed using sp_executesql. Performance is good when using dynamic SQL like this but the resulting code is awful from a legibility point of view.

I have heard of the ROWCOUNT method but as far as I'm aware it is still susceptible to the same issues when you introduce the dynamic order by element.

So, at the risk of asking the impossible, what other options are there?

EDIT

In order to make some useful progress here I have put together three queries highlighting the various suggested methods:

  1. Current, Dynamic SQL solution (execution time 147ms)

  2. gbn Solution (execution time 1687ms)

  3. Anders Solution (execution time 1604ms)

  4. Muhmud Solution (execution time 46ms)

like image 461
cusimar9 Avatar asked Apr 04 '13 15:04

cusimar9


2 Answers

(Edited)

DECLARE 
      @OrderColumnName SYSNAME
    , @RowStart INT
    , @RowCount INT
    , @TopCount INT

SELECT 
      @OrderColumnName = 'EmployeeID'
    , @RowStart = 5
    , @RowCount = 50
    , @TopCount = @RowStart + @RowCount – 1

@muhmud's solution -

; WITH data AS 
(
    SELECT 
          wo.WorkOutID
        , RowIDByEmployee = ROW_NUMBER() OVER (ORDER BY wo.EmployeeID)
        , RowIDByDateOut = ROW_NUMBER() OVER (ORDER BY wo.DateOut)  
    FROM dbo.WorkOut wo
), CTE AS 
(         
    SELECT
          wo.WorkOutID
        , RowID = RowIDByEmployee 
    FROM data wo
    WHERE @OrderColumnName = 'EmployeeID'

    UNION ALL

    SELECT
          wo.WorkOutID
        , RowID = RowIDByDateOut
    FROM data wo
    WHERE @OrderColumnName = 'DateOut'
)
SELECT wo.*  
FROM CTE t
JOIN dbo.WorkOut wo ON t.WorkOutID = wo.WorkOutID
WHERE t.RowID BETWEEN @RowStart AND @RowCount + @RowStart - 1
ORDER BY t.RowID
OPTION (RECOMPILE)

Table 'WorkOut'. Scan count 3, logical reads 14254, physical reads 1,
read-ahead reads 14017, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0,
physical reads 0, read-ahead reads 0, lob logical reads 0, lob
physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
   CPU time = 1295 ms,  elapsed time = 3048 ms.

Solution without data common table expression -

;WITH CTE AS 
(         
    SELECT
          wo.WorkOutID
        , RowID = ROW_NUMBER() OVER (ORDER BY wo.EmployeeID) 
    FROM dbo.WorkOut wo
    WHERE @OrderColumnName = 'EmployeeID'

    UNION ALL

    SELECT
          wo.WorkOutID
        , RowID = ROW_NUMBER() OVER (ORDER BY wo.DateOut) 
    FROM dbo.WorkOut wo
    WHERE @OrderColumnName = 'DateOut'
)
SELECT wo.*  
FROM CTE t
JOIN dbo.WorkOut wo ON t.WorkOutID = wo.WorkOutID
WHERE t.RowID BETWEEN @RowStart AND @RowCount + @RowStart - 1
ORDER BY t.RowID
OPTION (RECOMPILE)

Table 'WorkOut'. Scan count 3, logical reads 14254, physical reads 1, read-ahead reads 14017, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 1296 ms,  elapsed time = 3049 ms.

Solution with TOP -

;WITH CTE AS 
(         
    SELECT TOP (@TopCount)
          wo.WorkOutID
        , RowID = ROW_NUMBER() OVER (ORDER BY wo.EmployeeID) 
    FROM dbo.WorkOut wo
    WHERE @OrderColumnName = 'EmployeeID'

    UNION ALL

    SELECT TOP (@TopCount)
          wo.WorkOutID
        , RowID = ROW_NUMBER() OVER (ORDER BY wo.DateOut) 
    FROM dbo.WorkOut wo
    WHERE @OrderColumnName = 'DateOut'
)
SELECT wo.*  
FROM CTE t
JOIN dbo.WorkOut wo ON t.WorkOutID = wo.WorkOutID
WHERE t.RowID > @RowStart - 1
ORDER BY t.RowID
OPTION (RECOMPILE)

Table 'WorkOut'. Scan count 3, logical reads 14246, physical reads 1, read-ahead reads 14017, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 1248 ms,  elapsed time = 2864 ms.

enter image description here

like image 23
Devart Avatar answered Oct 21 '22 22:10

Devart


How about this:

WITH data as (
    SELECT OrderID,
            ROW_NUMBER() OVER ( ORDER BY OrderID asc) as OrderID_ROW_ID,
            ROW_NUMBER() OVER ( ORDER BY Surname asc) as Surname_ROW_ID
        FROM Orders --WHERE X   
), CTE AS (               
        SELECT OrderID, OrderID_ROW_ID as ROW_ID
        FROM data
        where @SortCol = 'OrderID'

        union all

        SELECT OrderID, Surname_ROW_ID
        FROM data
        where @SortCol = 'Surname'
)
SELECT Orders.*, ROW_ID FROM CTE
INNER JOIN Orders ON CTE.OrderID = Orders.OrderID       
WHERE ROW_ID BETWEEN @RowStart AND @RowStart + @RowCount -1
order by ROW_ID
option (recompile);

Edit: Using option (recompile) on the example query in the post makes it go much quicker. However, case could not be used in exactly that way to choose between ascending/descending order.

The reason for this is that a plan is being generated for values of the variables that are inappropriate, and then this plan is cached. Forcing a recompile allows it to use the actual values of the variables.

like image 145
muhmud Avatar answered Oct 21 '22 22:10

muhmud