Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server: Difference between PARTITION BY and GROUP BY

I've been using GROUP BY for all types of aggregate queries over the years. Recently, I've been reverse-engineering some code that uses PARTITION BY to perform aggregations. In reading through all the documentation I can find about PARTITION BY, it sounds a lot like GROUP BY, maybe with a little extra functionality added in? Are they two versions of the same general functionality, or are they something different entirely?

like image 320
Mike Mooney Avatar asked Mar 08 '10 20:03

Mike Mooney


People also ask

Is partition by faster than GROUP BY?

However, it's still slower than the GROUP BY. The IO for the PARTITION BY is now much less than for the GROUP BY, but the CPU for the PARTITION BY is still much higher. Even when there is lots of memory, PARTITION BY – and many analytical functions – are very CPU intensive.

Can we use GROUP BY and partition by together in SQL?

Therefore, in conclusion, the PARTITION BY retrieves all the records in the table, while the GROUP BY only returns a limited number. One more thing is that GROUP BY does not allow to add columns which are not parts of GROUP BY clause in select statement. However, with PARTITION BY clause, we can add required columns.

Why we use over partition by in SQL Server?

PARTITION BYThe window function is applied to each partition separately and computation restarts for each partition. If PARTITION BY is not specified, the function treats all rows of the query result set as a single partition. Function will be applied on all rows in the partition if you don't specify ORDER BY clause.

What is difference between group functions and GROUP BY clause?

The GROUP BY clause specifies how to group rows from a data table when aggregating information, while the HAVING clause filters out rows that do not belong in specified groups. Aggregate functions perform a variety of actions such as counting all the rows in a table, averaging a column's data, and summing numeric data.


2 Answers

They're used in different places. group by modifies the entire query, like:

select customerId, count(*) as orderCount from Orders group by customerId 

But partition by just works on a window function, like row_number:

select row_number() over (partition by customerId order by orderId)     as OrderNumberForThisCustomer from Orders 

A group by normally reduces the number of rows returned by rolling them up and calculating averages or sums for each row. partition by does not affect the number of rows returned, but it changes how a window function's result is calculated.

like image 129
Andomar Avatar answered Sep 23 '22 15:09

Andomar


We can take a simple example.

Consider a table named TableA with the following values:

id  firstname                   lastname                    Mark ------------------------------------------------------------------- 1   arun                        prasanth                    40 2   ann                         antony                      45 3   sruthy                      abc                         41 6   new                         abc                         47 1   arun                        prasanth                    45 1   arun                        prasanth                    49 2   ann                         antony                      49 

GROUP BY

The SQL GROUP BY clause can be used in a SELECT statement to collect data across multiple records and group the results by one or more columns.

In more simple words GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns.

Syntax:

SELECT expression1, expression2, ... expression_n,         aggregate_function (aggregate_expression) FROM tables WHERE conditions GROUP BY expression1, expression2, ... expression_n; 

We can apply GROUP BY in our table:

select SUM(Mark)marksum,firstname from TableA group by id,firstName 

Results:

marksum  firstname ---------------- 94      ann                       134     arun                      47      new                       41      sruthy    

In our real table we have 7 rows and when we apply GROUP BY id, the server group the results based on id:

In simple words:

here GROUP BY normally reduces the number of rows returned by rolling them up and calculating Sum() for each row.

PARTITION BY

Before going to PARTITION BY, let us look at the OVER clause:

According to the MSDN definition:

OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. You can use the OVER clause with functions to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results.

PARTITION BY will not reduce the number of rows returned.

We can apply PARTITION BY in our example table:

SELECT SUM(Mark) OVER (PARTITION BY id) AS marksum, firstname FROM TableA 

Result:

marksum firstname  ------------------- 134     arun                      134     arun                      134     arun                      94      ann                       94      ann                       41      sruthy                    47      new   

Look at the results - it will partition the rows and returns all rows, unlike GROUP BY.

like image 40
Arunprasanth K V Avatar answered Sep 19 '22 15:09

Arunprasanth K V