Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Workaround for multiple rollups

Is there a way to accomplish the following in BigQuery? This syntax is supported in a DB such as Postgres:

SELECT ProductGroup, Product, Year, Month, AVG(Revenue) 
FROM Sales
group by rollup(ProductGroup, Product), rollup(Year, Month)

In other words, I want the cross product of the two rollups:

ROLLUP(ProductGroup, Product) --> (), (ProductGroup), (ProductGroup, Product)
ROLLUP(Year, Month) --> (), (Year), (Year, Month)

((), (ProductGroup), (ProductGroup, Product)) x ((), (Year), (Year, Month))
= (
    (), (ProductGroup), (ProductGroup, Product),
    (Year), (Year, ProductGroup), (Year, ProductGroup, Product).
     (Year, Month), (Year, Month, ProductGroup), (Year, Month, ProductGroup, Product)
)

When attempting in BQ I get the following error:

The GROUP BY clause only supports ROLLUP when there are no other grouping elements at [2:10]


Here is an update with some sample pictures and data.

First, I want to replicate the functionality of an Excel Pivot table. This is where the cross product of the rollups from ROWS and COLS come into play:

enter image description here

Note that the Pivot table has 63 value cells.

Now, the correct SQL syntax for this is as follows in the verbose GROUP BY-only syntax:

enter image description here

Notice that this also produces exactly 63 rows (and since we have one value column -- SUM Revenue -- 63 rows x 1 col = 63 value cells). The query is the following:

with sales as (
    select 2010 Year, 'Jan' Month, 'Electronics' ProductGroup, 'Phone' Product, 100 Revenue union all
    select 2010,    'Jan',  'Electronics',  'Laptop',   200 union all
    select 2010,    'Jan',  'Cars', 'Jeep', 250 union all
    select 2010,    'Jan',  'Cars', 'Hummer',   105 union all
    select 2010,    'Feb',  'Electronics',  'Phone',    110 union all
    select 2010,    'Feb',  'Electronics',  'Laptop',   300 union all
    select 2010,    'Feb',  'Cars', 'Jeep', 50 union all
    select 2010,    'Feb',  'Cars', 'Hummer',   75 union all
    select 2010,    'Mar',  'Electronics',  'Phone',    80 union all
    select 2010,    'Mar',  'Electronics',  'Laptop',   200 union all
    select 2010,    'Mar',  'Cars', 'Jeep', 100 union all
    select 2010,    'Mar',  'Cars', 'Hummer',   50 union all
    select 2011,    'Jan',  'Electronics',  'Phone',    200 union all
    select 2011,    'Jan',  'Electronics',  'Laptop',   300 union all
    select 2011,    'Jan',  'Cars', 'Jeep', 100 union all
    select 2011,    'Jan',  'Cars', 'Hummer',   200 union all
    select 2011,    'Feb',  'Electronics',  'Phone',    300 union all
    select 2011,    'Feb',  'Electronics',  'Laptop',   900 union all
    select 2011,    'Feb',  'Cars', 'Jeep', 100 union all
    select 2011,    'Feb',  'Cars', 'Hummer',   200 union all
    select 2011,    'Mar',  'Electronics',  'Phone',    400 union all
    select 2011,    'Mar',  'Electronics',  'Laptop',   350 union all
    select 2011,    'Mar',  'Cars', 'Jeep', 240 union all
    select 2011,    'Mar',  'Cars', 'Hummer',   130
)
-- ROLLUP(ProductGroup, Product), ROLLUP(Year, Month)
--> (), (ProductGroup), (ProductGroup, Product)
--> (Year), (Year, ProductGroup), (Year, ProductGroup, Product)
--> (Year, Month), (Year, Month, ProductGroup), (Year, Month, ProductGroup, Product)

SELECT NULL, NULL, NULL, NULL, AVG(Revenue) FROM Sales UNION ALL                                                -- ()
SELECT ProductGroup, NULL, NULL, NULL, AVG(Revenue) FROM Sales GROUP BY ProductGroup UNION ALL                  -- (ProductGroup)
SELECT ProductGroup, Product, NULL, NULL, AVG(Revenue) FROM Sales GROUP BY ProductGroup, Product UNION ALL      -- (ProductGroup, Product)

SELECT NULL, NULL, Year, NULL, AVG(Revenue) FROM Sales GROUP BY Year UNION ALL                                  -- (Year)
SELECT ProductGroup, NULL, Year, NULL, AVG(Revenue) FROM Sales GROUP BY Year, ProductGroup UNION ALL            -- (Year, ProductGroup)
SELECT ProductGroup, Product, Year, NULL, AVG(Revenue) FROM Sales GROUP BY Year, ProductGroup, Product UNION ALL-- (Year, ProductGroup, Product)

SELECT NULL, NULL, Year, Month, AVG(Revenue) FROM Sales GROUP BY Year, Month UNION ALL                          -- (Year, Month)
SELECT ProductGroup, NULL, Year, Month, AVG(Revenue) FROM Sales GROUP BY ProductGroup, Year, Month UNION ALL    -- (ProductGroup, Year, Month)
SELECT ProductGroup, Product, Year, Month, AVG(Revenue) FROM Sales GROUP BY ProductGroup, Product, Year, Month  -- (ProductGroup, Product, Year Month)

However, this query is really a nightmare to product -- even if generated programatically -- as there may be an order by, subselect, ... etc and union-ing all those statements together could potentially turn into a monstrous construction (for example, a 3 rows x 3 cols construction with a 100-line SQL statement would become 4^2 * 100 lines of sql, and 5x5 would be 5^2 * 100 lines, etc. if my math is correct).

What would be the proper way to do this then? Note that in a database like Postgres the following works as-is:

SELECT ProductGroup, Product, Year, Month, AVG(Revenue) FROM Sales GROUP BY ROLLUP(ProductGroup, Product), ROLLUP(Year, Month);

Here is the Saved Query if you want to use this as a starting point: https://console.cloud.google.com/bigquery?sq=260144861653:552549d2a81a47b59df6e3d16ef9bf17.


Finally, if you think it would be a useful feature to add in the GROUPING SETS and CUBE, please upvote this feature request: https://issuetracker.google.com/issues/204913323.

like image 226
David542 Avatar asked Nov 04 '21 16:11

David542


People also ask

What does the rollup subclause provide in terms of grouping and how does it differ from the CUBE subclause?

Unlike the CUBE subclause, ROLLUP does not create all possible grouping sets based on the dimension columns; the CUBE makes a subset of those. When generating the grouping sets, ROLLUP assumes a hierarchy among the dimension columns and only generates grouping sets based on this hierarchy.

Which clause is used for rolling up data?

Introduction to SQL ROLLUP The ROLLUP is an extension of the GROUP BY clause. The ROLLUP option allows you to include extra rows that represent the subtotals, which are commonly referred to as super-aggregate rows, along with the grand total row.

What is the difference between rollup and CUBE?

ROLLUP and CUBE are simple extensions to the SELECT statement's GROUP BY clause. ROLLUP creates subtotals at any level of aggregation needed, from the most detailed up to a grand total. CUBE is an extension similar to ROLLUP , enabling a single statement to calculate all possible combinations of subtotals.

What is use of rollup function in GROUP BY clause?

It allows us to generate multiple grouping sets within a single query, which is impossible with the GROUP BY clause as it aggregates a single group. Thus, we can say that the ROLLUP provides a more detailed analysis by employing a single query to create several grouping sets along the hierarchy of columns.


Video Answer


3 Answers

Back to my original answer of grouping twice...

WITH
  rollup_pg_p AS
(
  SELECT
    ProductGroup, Product, 1 AS dummy, Year, Month, SUM(Revenue) AS sum_rev, COUNT(Revenue) AS cnt_row
  FROM
    Sales
  GROUP BY
    ROLLUP(Year, Month, ProductGroup, Product)
  HAVING
    Month IS NOT NULL -- This prevents the roll up going further than desired
                      -- Effectively giving `GROUP BY Year, Month, ROLLUP(ProductGroup, Product)
)
SELECT
  ProductGroup, Product, Year, Month, SUM(sum_rev) / SUM(cnt_row)
FROM
  rollup_pg_p
GROUP BY
  ROLLUP(ProductGroup, Product, Dummy, Year, Month)
HAVING
  dummy IS NOT NULL -- Same 'trick' again, but we created the dummy column
                    -- as ProductGroup and Product CAN legitimately be NULL at this point.
ORDER BY
  1, 2, 3, 4

(NOTE: Other dialects would use WHERE NOT GROUPING(Product), so avoid the need for the dummy column, but BigQuery doesn't appear to have that functionality either...)

Still has the downside of not working for some aggregates, but might be meaningfully faster than the alternative approaches.

like image 58
MatBailie Avatar answered Oct 21 '22 03:10

MatBailie


ugly but probably the simplest way to have 3 group by statement and union them :

SELECT ProductGroup,Product,NULL year ,NULL month, AVG(sales.Revenue) avg  
FROM sales 
GROUP BY ROLLUP(ProductGroup,Product)

UNION DISTINCT
SELECT ProductGroup,Product,Year,NULL month, AVG(sales.Revenue) avg  
FROM sales 
GROUP BY ROLLUP(Year, ProductGroup, Product)

UNION DISTINCT 
SELECT ProductGroup,Product,Year,MONTH, AVG(sales.Revenue) avg  
FROM sales 
GROUP BY ROLLUP(Year, Month, ProductGroup, Product)

GCP fiddle

like image 23
eshirvana Avatar answered Oct 21 '22 05:10

eshirvana


it gives the correct number of rows but how would I pass along the AVG(Revenue)?

Consider below - looks like a simple pattern to apply to more of your potential cases

select r.Year, r.Month, r.ProductGroup, r.Product, 
  round(avg(Revenue), 2) avg_Revenue
from (
  select * from (
    select ProductGroup, Product from Sales group by rollup(ProductGroup, Product)
  ), (
    select Year, Month from Sales group by rollup(Year, Month)
  )
) r 
join sales s
on if(r.Year is null, true, r.Year = s.Year) 
and if(r.Month is null, true, r.Month = s.Month) 
and if(r.ProductGroup is null, true, r.ProductGroup = s.ProductGroup)
and if(r.Product is null, true, r.Product = s.Product) 
group by r.Year, r.Month, r.ProductGroup, r.Product     

if applied to sample data in your script - output is (cut by top rows)

enter image description here

like image 1
Mikhail Berlyant Avatar answered Oct 21 '22 05:10

Mikhail Berlyant