<p>I have found some SQL queries in an application I am examining like this:</p> <pre class="prettyprint"><code>SELECT DISTINCT Company, Warehouse, Item, SUM(quantity) OVER (PARTITION BY Company, Warehouse, Item) AS stock </code></pre> <p>I'm quite sure this gives the same result as:</p> <pre class="prettyprint"><code>SELECT Company, Warehouse, Item, SUM(quantity) AS stock GROUP BY Company, Warehouse, Item </code></pre> <p>Is there any benefit (performance, readability, additional flexibility in writing the query, maintainability, etc.) of using the first approach over the later?</p>

<h3>Performance:</h3> <p><strong>Winner: <code>GROUP BY</code></strong></p> <p>Some very rudimentary testing on a large table with unindexed columns showed that at least in my case the two queries generated a completely different query plan. The one for <code>PARTITION BY</code> was significantly slower. </p> <p>The <code>GROUP BY</code> query plan included only a table scan and aggregation operation while the <code>PARTITION BY</code> plan had two nested loop self-joins. The <code>PARTITION BY</code> took about 2800ms on the second run, the <code>GROUP BY</code> took only 500ms.</p> <h3>Readability / Maintainability:</h3> <p><strong>Winner: <code>GROUP BY</code></strong></p> <p>Based on the opinions of the commenters here the <code>PARTITION BY</code> is less readable for most developers so it will be probably also harder to maintain in the future.</p> <h3>Flexibility</h3> <p><strong>Winner: <code>PARTITION BY</code></strong></p> <p><code>PARTITION BY</code> gives you more flexibility in choosing the grouping columns. With <code>GROUP BY</code> you can have only one set of grouping columns for all aggregated columns. With <code>DISTINCT + PARTITION BY</code> you can have different column in each partition. Also on some DBMSs you can chose from more aggregation/analytic functions in the <code>OVER</code> clause.</p>

DISTINCT with PARTITION BY vs. GROUPBY

Tags:

sql

sql-server

group-by

distinct

query-performance

I have found some SQL queries in an application I am examining like this:

Click to copy

SELECT DISTINCT
Company, Warehouse, Item,
SUM(quantity) OVER (PARTITION BY Company, Warehouse, Item) AS stock

I'm quite sure this gives the same result as:

Click to copy

SELECT
Company, Warehouse, Item,
SUM(quantity) AS stock
GROUP BY Company, Warehouse, Item

Is there any benefit (performance, readability, additional flexibility in writing the query, maintainability, etc.) of using the first approach over the later?

620

asked Dec 04 '13 12:12

Andris

1 Answers

Performance:

Winner: GROUP BY

Some very rudimentary testing on a large table with unindexed columns showed that at least in my case the two queries generated a completely different query plan. The one for PARTITION BY was significantly slower.

The GROUP BY query plan included only a table scan and aggregation operation while the PARTITION BY plan had two nested loop self-joins. The PARTITION BY took about 2800ms on the second run, the GROUP BY took only 500ms.

Readability / Maintainability:

Winner: GROUP BY

Based on the opinions of the commenters here the PARTITION BY is less readable for most developers so it will be probably also harder to maintain in the future.

Flexibility

Winner: PARTITION BY

PARTITION BY gives you more flexibility in choosing the grouping columns. With GROUP BY you can have only one set of grouping columns for all aggregated columns. With DISTINCT + PARTITION BY you can have different column in each partition. Also on some DBMSs you can chose from more aggregation/analytic functions in the OVER clause.

104

answered Oct 10 '22 04:10

Andris

Related questions
                            
                                Temporary table record limit in Sql server
                            
                                What's blocking "Select top 1 * from TableName with (nolock)" from returning a result?
                            
                                how to use Oracle's regexp_like in Hibernate HQL?
                            
                                Access public static final string in mybatis sql in mapper files
                            
                                How to report progress from long-running PostgreSQL function to client
                            
                                Casting NULL type when updating multiple rows
                            
                                Inner join using HQL
                            
                                How to use the divide function in the query?
                            
                                Can an SQL procedure return a table?
                            
                                Does EXCEPT execute faster than a JOIN when the table columns are the same
                            
                                catastrophic failure trying to select from linked server
                            
                                Efficient time series querying in Postgres
                            
                                Convert column to string in SQL Select
                            
                                Mysql create table with multiple foreign key on delete set null
                            
                                Convert row value in to column in SQL server (PIVOT)
                            
                                Maven ojdbc jar dependency error: package oracle.jdbc does not exist
                            
                                SQL: How to merge case-insensitive duplicates
                            
                                ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails for existing tables
                            
                                GROUP BY clause to get comma-separated values in sqlite
                            
                                Leaderboard design using SQL Server

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

DISTINCT with PARTITION BY vs. GROUPBY

Tags:

sql

sql-server

group-by

distinct

query-performance

Andris

People also ask

1 Answers

Performance:

Readability / Maintainability:

Flexibility

Andris

Recent Activity

Donate For Us