Multiple Row_Number() Calls in a Single SQL Query

Tags:

I'm trying to setup some data to calculate multiple medians in SQL Server 2008, but I'm having a performance problem. Right now, I'm using this pattern ([another example bottom). Yes, I'm not using a CTE, but using one won't fix the problem I'm having anyways and the performance is poor because the row_number sub-queries run in serial, not parallel.

Here's a full example. Below the SQL I explain the problem more.

Click to copy

-- build the example table    

CREATE TABLE #TestMedian (
    StateID INT,
    TimeDimID INT,
    ConstructionStatusID INT,

    PopulationSize BIGINT,
    SquareMiles BIGINT
);

INSERT INTO #TestMedian (StateID, TimeDimID, ConstructionStatusID, PopulationSize, SquareMiles)
VALUES (1, 1, 1, 100000, 200000);

INSERT INTO #TestMedian (StateID, TimeDimID, ConstructionStatusID, PopulationSize, SquareMiles)
VALUES (1, 1, 1, 200000, 300000);

INSERT INTO #TestMedian (StateID, TimeDimID, ConstructionStatusID, PopulationSize, SquareMiles)
VALUES (1, 1, 1, 300000, 400000);

INSERT INTO #TestMedian (StateID, TimeDimID, ConstructionStatusID, PopulationSize, SquareMiles)
VALUES (1, 1, 1, 100000, 200000);

INSERT INTO #TestMedian (StateID, TimeDimID, ConstructionStatusID, PopulationSize, SquareMiles)
VALUES (1, 1, 1, 250000, 300000);

INSERT INTO #TestMedian (StateID, TimeDimID, ConstructionStatusID, PopulationSize, SquareMiles)
VALUES (1, 1, 1, 350000, 400000);

--TruNCATE TABLE TestMedian

    SELECT
        StateID
        ,TimeDimID
        ,ConstructionStatusID
        ,NumberOfRows = COUNT(*) OVER (PARTITION BY StateID, TimeDimID, ConstructionStatusID)
        ,PopulationSizeRowNum = ROW_NUMBER() OVER (PARTITION BY StateID, TimeDimID, ConstructionStatusID ORDER BY PopulationSize)
        ,SquareMilesRowNum = ROW_NUMBER() OVER (PARTITION BY StateID, TimeDimID, ConstructionStatusID ORDER BY SquareMiles)
        ,PopulationSize
        ,SquareMiles
    INTO #MedianData
    FROM #TestMedian

    SELECT MinRowNum = MIN(PopulationSizeRowNum), MaxRowNum = MAX(PopulationSizeRowNum), StateID, TimeDimID, ConstructionStatusID, MedianPopulationSize= AVG(PopulationSize) 
    FROM #MedianData T
    WHERE PopulationSizeRowNum IN((NumberOfRows + 1) / 2, (NumberOfRows + 2) / 2)
    GROUP BY StateID, TimeDimID, ConstructionStatusID

    SELECT MinRowNum = MIN(SquareMilesRowNum), MaxRowNum = MAX(SquareMilesRowNum), StateID, TimeDimID, ConstructionStatusID, MedianSquareMiles= AVG(SquareMiles) 
    FROM #MedianData T
    WHERE SquareMilesRowNum IN((NumberOfRows + 1) / 2, (NumberOfRows + 2) / 2)
    GROUP BY StateID, TimeDimID, ConstructionStatusID


    DROP TABLE #MedianData
    DROP TABLE #TestMedian

The problem with this query is that SQL Server executes both of the "ROW__NUMBER() OVER..." sub-queries in serial, not in parallel. So if I have 10 of these ROW__NUMBER calculations, it'll calculate them one after the other and I get linear growth, which stinks. I have an 8-way 32GB system I'm running this query on and I would love some parallelism. I'm trying to run this type of query on a 5,000,000 row table.

I can tell its doing this by looking at the query plan and seeing the Sorts in the same execution path (displaying the query plan's XML wouldn't work real well on SO).

So my question is this: How can I alter this query so that the ROW_NUMBER queries are executed in parallel? Is there a completely different technique I can use to prepare the data for multiple median calculations?

269

asked Sep 04 '09 16:09

JayRu

1 Answers

Each ROW_NUMBER requires the rows to be sorted first. Since your two RNs have different ORDER BY conditions, the query must produce the result, then order it for first RNs (it may be orderred already by), produce the RN, then order it for second RN and produce the second RN result. There simply isn't any magic pixie dust that can materialize a row number value without counting where the row is in the required order.

196

answered Sep 18 '22 23:09

Remus Rusanu

Related questions
                            
                                Need help understanding some execution plans
                            
                                How to check if statements dynamically
                            
                                Select only rows with max date
                            
                                How to link Gatsby.js with my Express server
                            
                                Use Left Join Alias in Column Select in SQL Views
                            
                                Does INTERSECT have a higher precedence compared to UNION?
                            
                                Remove element from json array by condition sql server 2016
                            
                                Azure Data Studio - Setting SQL variables to be used as globals
                            
                                Select date ranges where periods do not overlap
                            
                                TADOQuery - Edit mode inserts new record rather than editing
                            
                                Dropping a group of tables in SQL Server
                            
                                How do I find the high water mark (for sessions) on Oracle 9i
                            
                                Any good SQL Anywhere database schema comparison tools?
                            
                                While-clause in T-SQL that loops forever
                            
                                LINQ COUNT on multiple columns
                            
                                SQL produced by Entity Framework for string matching
                            
                                MySQL Fulltext index with Rails 2.3.2 (migration problem)
                            
                                Get the SUM of TIME datatypes (MSSQL08) from a table
                            
                                Handling nulls in Datawarehouse
                            
                                SQL Query to Select Everything Except the Max Value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Multiple Row_Number() Calls in a Single SQL Query

Tags:

sql

sql-server

tsql

JayRu

People also ask

1 Answers

Remus Rusanu

Recent Activity

Donate For Us