ROW_NUMBER() OVER PARTITION optimization

Question

I have the following query:

SELECT *
FROM
(
    SELECT *,
        ROW_NUMBER() OVER(PARTITION BY Code ORDER BY Price ASC) as RowNum
    from Offers) r
where RowNum = 1

Offers table contains about 10 million records. But there are only ~4000 distinct codes there. So I need to get the row with the lowest price for each code and there will be only 4000 rows in the result.

I have an Index on (Code, Price) columns with all other columns in INCLUDE statement.

The query runs for 2 minutes. And if I look at the execution plan, I see Index scan with 10M actual rows. So, I guess it scans the whole index to get needed values.

Why MSSQL do the whole index scan? Is it because subquery needs the whole data? How to avoid this scan? Is there a SQL hint to process only the first row in partition?

Is there another way to optimize such query?

Nick P. · Accepted Answer

After trying multiple different solutions, I've found the fastest query with CROSS APPLY statement:

SELECT C.* 
FROM (SELECT DISTINCT Code from Offers) A
CROSS APPLY (SELECT TOP 1 * 
             FROM Offers B
             WHERE A.Code  = B.Code 
             ORDER by Price) C

It take ~1 second to run.

HABO · Answer

Try creating an index on ( Code, Price ) without including the other columns and then (assuming that there is a unique Id column):

select L.*
  from Offers as L inner join
  ( select Id,
      Row_Number() over ( partition by Code order by Price ) as RN
    from Offers ) as R on R.Id = L.Id and R.RN = 1

An index scan on a smaller index ought to help.

Second guess would be to get the Id of the row with the lowest Price for each Code explicitly: Get distinct Code values, get Id of top 1 (to avoid problems with duplicate prices) Min( Price ) row for that Code, join with Offers to get complete rows. Again, the more compact index should help.

ROW_NUMBER() OVER PARTITION optimization

Tags:

tsql

sql-server-2012

query-performance

Nick P.

2 Answers

Nick P.

HABO

Recent Activity

Donate For Us

ROW_NUMBER() OVER PARTITION optimization

Tags:

tsql

sql-server-2012

query-performance

Nick P.

2 Answers

Nick P.

HABO

Related questions

Recent Activity

Donate For Us