Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

bigquery group by all columns except a few

I have a table with loads of fields, and I am trying to group by all except two values which I am summing on. I would like to do something like

SELECT my_table.* except(value_1, value_2)
    , sum(value_1)
    , sum(value_2)
FROM my_table
GROUP BY my_table.* except(value_1, value_2)

But unfortunately GROUP BY my_table.* except(value_1, value_2) do not work. Any suggestions please?

like image 348
DarioB Avatar asked Feb 20 '19 17:02

DarioB


People also ask

How do I SELECT all columns except one in BigQuery?

A SELECT * EXCEPT statement specifies the names of one or more columns to exclude from the result. All matching column names are omitted from the output. Note: SELECT * EXCEPT does not exclude columns that do not have names.

How do I GROUP BY all columns?

To arrange similar (identical) data into groups, we use SQL GROUP BY clause. The SQL GROUP BY clause is used along with some aggregate functions to group columns that have the same values in different rows. We generally use the GROUP BY clause with the SELECT statement, WHERE clause, and ORDER BY clauses.

Should GROUP BY have all the columns in the SELECT?

If you specify the GROUP BY clause, columns referenced must be all the columns in the SELECT clause that do not contain an aggregate function. These columns can either be the column, an expression, or the ordinal number in the column list.


1 Answers

Below is for BigQuery Standard SQL

#standardSQL
SELECT DISTINCT * EXCEPT(value_1, value_2, grp),
  SUM(value_1) OVER(PARTITION BY grp) sum_value_1,
  SUM(value_2) OVER(PARTITION BY grp) sum_value_2
FROM (
  SELECT *, REGEXP_REPLACE(TO_JSON_STRING(t), r'"(?:value_1|value_2)":.+?[,}]', '') grp
  FROM `project.dataset.table` t
)

You can test, play with above using dummy data as in below example

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 value_1, 2 value_2, 3 value_3, 4 value_4 UNION ALL
  SELECT 11, 12, 3, 14 UNION ALL
  SELECT 21, 22, 3, 14
)
SELECT DISTINCT * EXCEPT(value_1, value_2, grp),
  SUM(value_1) OVER(PARTITION BY grp) sum_value_1,
  SUM(value_2) OVER(PARTITION BY grp) sum_value_2
FROM (
  SELECT *, REGEXP_REPLACE(TO_JSON_STRING(t), r'"(?:value_1|value_2)":.+?[,}]', '') grp
  FROM `project.dataset.table` t
)

with result as

Row value_3 value_4 sum_value_1 sum_value_2  
1   3       14      32          34   
2   3       4       1           2    

Above will work with any number of columns and you don't need to reference them all explicitly - only those columns to be excluded to be explicitly referenced - value_1 and value_2 in this example

like image 178
Mikhail Berlyant Avatar answered Oct 21 '22 16:10

Mikhail Berlyant