I've been using <code>GROUP BY</code> for all types of aggregate queries over the years. Recently, I've been reverse-engineering some code that uses <code>PARTITION BY</code> to perform aggregations. In reading through all the documentation I can find about <code>PARTITION BY</code>, it sounds a lot like <code>GROUP BY</code>, maybe with a little extra functionality added in? Are they two versions of the same general functionality, or are they something different entirely?

We can take a simple example. Consider a table named <code>TableA</code> with the following values: <pre class="prettyprint"><code>id firstname lastname Mark ------------------------------------------------------------------- 1 arun prasanth 40 2 ann antony 45 3 sruthy abc 41 6 new abc 47 1 arun prasanth 45 1 arun prasanth 49 2 ann antony 49 </code></pre> <code>GROUP BY</code> <blockquote> The SQL GROUP BY clause can be used in a SELECT statement to collect data across multiple records and group the results by one or more columns. In more simple words GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns. </blockquote> Syntax: <pre class="prettyprint"><code>SELECT expression1, expression2, ... expression_n, aggregate_function (aggregate_expression) FROM tables WHERE conditions GROUP BY expression1, expression2, ... expression_n; </code></pre> We can apply <code>GROUP BY</code> in our table: <pre class="prettyprint"><code>select SUM(Mark)marksum,firstname from TableA group by id,firstName </code></pre> Results: <pre class="prettyprint"><code>marksum firstname ---------------- 94 ann 134 arun 47 new 41 sruthy </code></pre> In our real table we have 7 rows and when we apply <code>GROUP BY id</code>, the server group the results based on <code>id</code>: In simple words: <blockquote> here <code>GROUP BY</code> normally reduces the number of rows returned by rolling them up and calculating <code>Sum()</code> for each row. </blockquote> <code>PARTITION BY</code> Before going to PARTITION BY, let us look at the <code>OVER</code> clause: According to the MSDN definition: <blockquote> OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. You can use the OVER clause with functions to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results. </blockquote> PARTITION BY will not reduce the number of rows returned. We can apply PARTITION BY in our example table: <pre class="prettyprint"><code>SELECT SUM(Mark) OVER (PARTITION BY id) AS marksum, firstname FROM TableA </code></pre> Result: <pre class="prettyprint"><code>marksum firstname ------------------- 134 arun 134 arun 134 arun 94 ann 94 ann 41 sruthy 47 new </code></pre> Look at the results - it will partition the rows and returns all rows, unlike GROUP BY.

SQL Server: Difference between PARTITION BY and GROUP BY

Tags:

sql-server

tsql

aggregate-functions

window-functions

I've been using GROUP BY for all types of aggregate queries over the years. Recently, I've been reverse-engineering some code that uses PARTITION BY to perform aggregations. In reading through all the documentation I can find about PARTITION BY, it sounds a lot like GROUP BY, maybe with a little extra functionality added in? Are they two versions of the same general functionality, or are they something different entirely?

320

asked Mar 08 '10 20:03

Mike Mooney

2 Answers

They're used in different places. group by modifies the entire query, like:

select customerId, count(*) as orderCount from Orders group by customerId

But partition by just works on a window function, like row_number:

select row_number() over (partition by customerId order by orderId)     as OrderNumberForThisCustomer from Orders

A group by normally reduces the number of rows returned by rolling them up and calculating averages or sums for each row. partition by does not affect the number of rows returned, but it changes how a window function's result is calculated.

129

answered Sep 23 '22 15:09

Andomar

We can take a simple example.

Consider a table named TableA with the following values:

id  firstname                   lastname                    Mark ------------------------------------------------------------------- 1   arun                        prasanth                    40 2   ann                         antony                      45 3   sruthy                      abc                         41 6   new                         abc                         47 1   arun                        prasanth                    45 1   arun                        prasanth                    49 2   ann                         antony                      49

GROUP BY

The SQL GROUP BY clause can be used in a SELECT statement to collect data across multiple records and group the results by one or more columns.

In more simple words GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns.

Syntax:

SELECT expression1, expression2, ... expression_n,         aggregate_function (aggregate_expression) FROM tables WHERE conditions GROUP BY expression1, expression2, ... expression_n;

We can apply GROUP BY in our table:

select SUM(Mark)marksum,firstname from TableA group by id,firstName

Results:

marksum  firstname ---------------- 94      ann                       134     arun                      47      new                       41      sruthy

In our real table we have 7 rows and when we apply GROUP BY id, the server group the results based on id:

In simple words:

here GROUP BY normally reduces the number of rows returned by rolling them up and calculating Sum() for each row.

PARTITION BY

Before going to PARTITION BY, let us look at the OVER clause:

According to the MSDN definition:

OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. You can use the OVER clause with functions to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results.

PARTITION BY will not reduce the number of rows returned.

We can apply PARTITION BY in our example table:

SELECT SUM(Mark) OVER (PARTITION BY id) AS marksum, firstname FROM TableA

Result:

marksum firstname  ------------------- 134     arun                      134     arun                      134     arun                      94      ann                       94      ann                       41      sruthy                    47      new

Look at the results - it will partition the rows and returns all rows, unlike GROUP BY.

answered Sep 19 '22 15:09

Arunprasanth K V

Related questions
                            
                                Sleep Command in T-SQL?
                            
                                Selecting data from two different servers in SQL Server
                            
                                Is a view faster than a simple query?
                            
                                How to remove a column from an existing table?
                            
                                Is there a combination of "LIKE" and "IN" in SQL?
                            
                                Can a foreign key be NULL and/or duplicate?
                            
                                SQL Server - inner join when updating [duplicate]
                            
                                SQL Server Profiler - How to filter trace to only display events from one database?
                            
                                SQL Server Escape an Underscore
                            
                                How to rename a table in SQL Server?
                            
                                Is there a Boolean data type in Microsoft SQL Server like there is in MySQL? [duplicate]
                            
                                Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "Latin1_General_CI_AS" in the equal to operation
                            
                                Get day of week in SQL Server 2005/2008
                            
                                Conversion of a datetime2 data type to a datetime data type results out-of-range value
                            
                                What's the difference between a temp table and table variable in SQL Server?
                            
                                SELECT INTO a table variable in T-SQL
                            
                                How to avoid the "divide by zero" error in SQL?
                            
                                How do I query for all dates greater than a certain date in SQL Server?
                            
                                How to set a default value for an existing column
                            
                                Select statement to find duplicates on certain fields

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With