Is there a performance difference between the following 2 queries, and if so, then which one is better?: <pre class="prettyprint"><code> select q.id, q.name from( select id, name, row_number over (partition by name order by id desc) as row_num from table ) q where q.row_num = 1 </code></pre> versus <pre class="prettyprint"><code>select max(id) , name from table group by name </code></pre> (The result set should be the same) This is assuming that no indexes are set. UPDATE: I tested this, and the <code>group by</code> was faster.

I'd use the <code>group by name</code>. Not much in it when the index is <code>name, id DESC</code> (Plan 1) but if the index is declared as <code>name, id ASC</code> (Plan 2) then in 2008 I see the <code>ROW_NUMBER</code> version is unable to use this index and gets a sort operation whereas the <code>GROUP BY</code> is able to use a backwards index scan to avoid this. You'd need to check the plans on your version of SQL Server and with your data and indexes to be sure.

Is there a performance difference in using a GROUP BY with MAX() as the aggregate vs ROW_NUMBER over partition by?

Tags:

sql

sql-server-2008

group-by

database-partitioning

Is there a performance difference between the following 2 queries, and if so, then which one is better?:

    select 
    q.id, 
    q.name 
    from(
        select id, name, row_number over (partition by name order by id desc) as row_num
from table
    ) q
        where q.row_num = 1

versus

select
max(id) ,
name
from table
group by name

(The result set should be the same)

This is assuming that no indexes are set.

UPDATE: I tested this, and the group by was faster.

502

asked Jun 27 '12 18:06

Marina

3 Answers

I had a table of about 4.5M rows, and I wrote both a MAX with GROUP BY as well as a ROW_NUMBER solution and tested them both. The MAX requires two clustered scans of the table, one to aggregate, and a second to join to the rest of the columns whereas ROW_NUMBER only needed one. (Obviously one or both of these could be indexed to minimize IO, but the point is that GROUP BY requires two index scans.)

This seems to match the other answers here.

answered Oct 07 '22 20:10

Robert Sievers

The group by should be faster. The row number has to assign a row to all rows in the table. It does this before filtering out the ones it doesn't want.

The second query is, by far, the better construct. In the first, you have to be sure that the columns in the partition clause match the columns that you want. More importantly, "group by" is a well-understood construct in SQL. I would also speculate that the group by might make better use of indexes, but that is speculation.

answered Oct 07 '22 20:10

Gordon Linoff

I'd use the group by name.

Not much in it when the index is name, id DESC (Plan 1)

but if the index is declared as name, id ASC (Plan 2) then in 2008 I see the ROW_NUMBER version is unable to use this index and gets a sort operation whereas the GROUP BY is able to use a backwards index scan to avoid this.

You'd need to check the plans on your version of SQL Server and with your data and indexes to be sure.

answered Oct 07 '22 21:10

Martin Smith

Related questions
                            
                                SQL Query to get misc column information
                            
                                In PostgreSQL, can we directly compare two timestamp with different time zone?
                            
                                Inserting a COALESCE(NULL,default)
                            
                                python cx_oracle cursor.rowcount returning 0 but cursor.fetchall returns data
                            
                                sql query : show name with all vowels
                            
                                how to find the json size stored in a column of postgres
                            
                                Self-referencing constraint in MS SQL
                            
                                How to select one row randomly taking into account a weight?
                            
                                Handling Complex WHERE clauses with a PHP Query Builder
                            
                                SQL Server Permissions on Stored Procs with dynamic SQL
                            
                                Why can't I use a bit field as a boolean expression in a SQL case statement?
                            
                                Inserting into DB with parameters safe from SQL injection?
                            
                                SELECT INTO behavior and the IDENTITY property
                            
                                Select parent row only if it has no children
                            
                                Filter data in SQL or in Java? [closed]
                            
                                how does codeigniter sanitize inputs?
                            
                                How do I edit BLOBs (containing JSON) in Oracle SQL Developer?
                            
                                Application users account registration and login, best way to handle?
                            
                                Does SQL Server TOP stop processing once it finds enough rows?
                            
                                Select values from a table that are not in a list SQL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With