When to use GROUPING SETS, CUBE and ROLLUP

Tags:

I have recently learned about GROUPING SETS, CUBE and ROLLUP for defining multiple grouping sets in sql server.

What I am asking is under what circumstances do we use these features ? What are the benefits and advantages of using them?

SELECT shipperid, YEAR(shippeddate) AS shipyear, COUNT(*) AS numorders FROM Sales.Orders GROUP BY GROUPING SETS ( ( shipperid, YEAR(shippeddate) ), ( shipperid ), ( YEAR(shippeddate) ), ( ) );   SELECT shipperid, YEAR(shippeddate) AS shipyear, COUNT(*) AS numorders FROM Sales.Orders GROUP BY CUBE( shipperid, YEAR(shippeddate) );   SELECT shipcountry, shipregion, shipcity, COUNT(*) AS numorders FROM Sales.Orders GROUP BY ROLLUP( shipcountry, shipregion, shipcity );

326

asked Aug 12 '14 22:08

Shin Kazama

2 Answers

Firstly, for those who haven't already read up on the subject:

Using GROUP BY with ROLLUP, CUBE, and GROUPING SETS

That being said, don't think about these grouping options as ways to get a result set. These are performance tools.

Let's take ROLLUP as a simple example.

I can use the following query to get the count of records for each value of GrpCol.

SELECT   GrpCol, count(*) AS cnt FROM     dbo.MyTable GROUP BY GrpCol

And I can use the following query to summarily "roll up" the count of ALL records.

SELECT   NULL, count(*) AS cnt FROM     dbo.MyTable

And I could UNION ALL the above two queries to get the exact same results I might get if I had written the first query with the ROLLUP clause (that's why I put the NULL in there).

It might actually be more convenient for me to execute this as two different queries because then I have the grouped results separate from my totals. Why would I want my final total mixed right in to the rest of those results? The answer is that doing both together using the ROLLUP clause is more efficient. SQL Server will use an execution plan that calculates all of the aggregations together in one pass. Compare that to the UNION ALL example which would provide the exact same results but use a less efficient execution plan (two table scans instead of one).

Imagine an extreme example in which you are working on a data set so large that each scan of the data takes one whole hour. You have to provide totals on basically every possible dimension (way to slice) that data every day. Aha! I bet one of these grouping options is exactly what you need. If you save off the results of that one scan into a special schema layout, you will then be able to run reports for the rest of the day off the saved results.

So I'm basically saying that you're working on a data warehouse project. For the rest of us it mostly falls into the "neat thing to know" category.

138

answered Sep 23 '22 01:09

SurroundedByFish

The CUBE is the same of GROUPING SETS with all possible combinations.

So this (using CUBE)

GROUP BY CUBE (C1, C2, C3, ..., Cn-2, Cn-1, Cn)

is the same of this (using GROUPING SETS)

GROUP BY GROUPING SETS (      (C1, C2, C3, ..., Cn-2, Cn-1, Cn) -- All dimensions are included.     ,( , C2, C3, ..., Cn-2, Cn-1, Cn) -- n-1 dimensions are included.     ,(C1, C3, ..., Cn-2, Cn-1, Cn)     …     ,(C1, C2, C3, ..., Cn-2, Cn-1,)     ,(C3, ..., Cn-2, Cn-1, Cn) -- n-2 dimensions included     ,(C1  ..., Cn-2, Cn-1, Cn)     …     ,(C1, C2) -- 2 dimensions are included.     ,…     ,(C1, Cn)     ,…     ,(Cn-1, Cn)     ,…     ,(C1) -- 1 dimension included     ,(C2)     ,…     ,(Cn-1)     ,(Cn)     ,() ) -- Grand total, 0 dimension is included.

Then, if you don't really need all combinations, you should use GROUPING SETS rather than CUBE

ROLLUP and CUBE operators generate some of the same result sets and perform some of the same calculations as OLAP applications. The CUBE operator generates a result set that can be used for cross tabulation reports. A ROLLUP operation can calculate the equivalent of an OLAP dimension or hierarchy.

Look here to see Grouping Sets Equivalents

UPDATE

I think an example would help here. Suppose you have a table of number of UFOs sightings by country and gender, like bellow:

╔═════════╦═══════╦═════════╗ ║ COUNTRY ║ GENDER║ #SIGHTS ║ ╠═════════╬═══════╬═════════╣ ║ USA     ║ F     ║     450 ║ ║ USA     ║ M     ║    1500 ║ ║ ITALY   ║ F     ║     704 ║ ║ ITALY   ║ M     ║     720 ║ ║ SWEDEN  ║ F     ║     317 ║ ║ SWEDEN  ║ M     ║     310 ║ ║ BRAZIL  ║ F     ║     144 ║ ║ BRAZIL  ║ M     ║     159 ║ ╚═════════╩═══════╩═════════╝

Then, if you want to know the totals for each country, by gender and grand total only, you should use GROUPING SETS

 select Country, Gender, sum(Number_Of_Sights)  from Table1  group by GROUPING SETS((Country), (Gender), ())  order by Country, Gender

SQL Fiddle

To get the same result with GROUP BY, you would use UNION ALL as:

select Country, NULL Gender, sum(Number_Of_Sights) from Table1 GROUP BY Country UNION ALL select NULL Country, Gender, sum(Number_Of_Sights) from Table1 GROUP BY GENDER UNION ALL SELECT NULL Country, NULL Gender, sum(Number_Of_Sights) FROM TABLE1 ORDER BY COUNTRY, GENDER

SQL Fiddle

But it is not possible to obtain the same result with CUBE, since it will return all possibilities.

Now, if you want to know all possible combinations, then you should use CUBE

answered Sep 24 '22 01:09

Nizam

Related questions
                            
                                Zero SQL deadlock by design - any coding patterns?
                            
                                TSQL: Call a stored procedure from another stored procedure and read the result
                            
                                Operator does not exist: json = json
                            
                                How to implement a do-while loop in tsql
                            
                                large amount of data in many text files - how to process?
                            
                                What is the optimal way to compare dates in Microsoft SQL server?
                            
                                Conditional SQLite check constraint?
                            
                                Identify if at least one row with given condition exists
                            
                                Slick 3.0 bulk insert or update (upsert)
                            
                                SQL merge not matched by target vs not matched by source
                            
                                Multiple and single indexes
                            
                                SQL (ORACLE): ORDER BY and LIMIT [duplicate]
                            
                                Select column value where other column is max of group
                            
                                How to replace SQL field value
                            
                                What's the equivalent for LISTAGG (Oracle database) in PostgreSQL?
                            
                                DB2: Won't Allow "NULL" column?
                            
                                Convert Comma Separated column value to rows
                            
                                How do I use T-SQL Group By
                            
                                how do I select a column based on condition?
                            
                                SQL AVG returning an int

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

When to use GROUPING SETS, CUBE and ROLLUP

Tags:

sql

sql-server

grouping

rollup

cube

Shin Kazama

People also ask

2 Answers

SurroundedByFish

Nizam

Recent Activity

Donate For Us