How to group by rollup on only some columns in Apache Spark SQL?

Question

I'm using the the SQL API for Spark 3.0 in a Databricks 7.0 runtime cluster. I know that I can do the following:

select
  coalesce(a, "All A") as colA,
  coalesce(b, "All B") as colB,
  sum(c) as sumC
from
  myTable
group by rollup (
  colA,
  colB
)
order by
  colA asc,
  colB asc

I'd then expect an output like:

+-------+-------+------+
| colA  | colB  | sumC |
+-------+-------+------+
| All A | All B |  300 |
| a1    | All B |  100 |
| a1    | b1    |  30  |
| a1    | b2    |  70  |
| a2    | All B |  200 |
| a2    | b1    |  50  |
| a2    | b2    |  150 |
+-------+-------+------+

However, I'm trying to write a query where only column b needs to be rolled up. I've written something like:

select
  a as colA,
  coalesce(b, "All B") as colB,
  sum(c) as sumC
from
  myTable
group by 
  a,
  rollup (b)
order by
  colA asc,
  colB asc

And I'd expect an output like:

+-------+-------+------+
| colA  | colB  | sumC |
+-------+-------+------+
| a1    | All B |  100 |
| a1    | b1    |  30  |
| a1    | b2    |  70  |
| a2    | All B |  200 |
| a2    | b1    |  50  |
| a2    | b2    |  150 |
+-------+-------+------+

I know this sort of operation is supported in at least some SQL APIs, but I get Error in SQL statement: UnsupportedOperationException when trying to run the above query. Does anyone know whether this behavior is simply as-of-yet unsupported in Spark 3.0 or if I just have the syntax wrong? The docs aren't helpful on the subject.

I know that I can accomplish this with union all, but I'd prefer to avoid that route, if only for the sake of elegance and brevity.

Thanks in advance, and please let me know if I can clarify anything.

wBob · Accepted Answer

Try this GROUPING SETS option:

%sql
SELECT 
  COALESCE( a, 'all a' ) a, 
  COALESCE( b, 'all b' ) b, 
  SUM(c) c
FROM myTable
GROUP BY a, b
GROUPING SETS ( ( a , b ), a )
ORDER BY a, b

My results (with updated numbers):

My results

How to group by rollup on only some columns in Apache Spark SQL?

Tags:

apache-spark

apache-spark-sql

databricks

Keith

1 Answers

wBob

Recent Activity

Donate For Us

How to group by rollup on only some columns in Apache Spark SQL?

Tags:

apache-spark

apache-spark-sql

databricks

Keith

1 Answers

wBob

Related questions

Recent Activity

Donate For Us