I have a table with loads of fields, and I am trying to group by all except two values which I am summing on. I would like to do something like
SELECT my_table.* except(value_1, value_2)
, sum(value_1)
, sum(value_2)
FROM my_table
GROUP BY my_table.* except(value_1, value_2)
But unfortunately GROUP BY my_table.* except(value_1, value_2)
do not work. Any suggestions please?
A SELECT * EXCEPT statement specifies the names of one or more columns to exclude from the result. All matching column names are omitted from the output. Note: SELECT * EXCEPT does not exclude columns that do not have names.
To arrange similar (identical) data into groups, we use SQL GROUP BY clause. The SQL GROUP BY clause is used along with some aggregate functions to group columns that have the same values in different rows. We generally use the GROUP BY clause with the SELECT statement, WHERE clause, and ORDER BY clauses.
If you specify the GROUP BY clause, columns referenced must be all the columns in the SELECT clause that do not contain an aggregate function. These columns can either be the column, an expression, or the ordinal number in the column list.
Below is for BigQuery Standard SQL
#standardSQL
SELECT DISTINCT * EXCEPT(value_1, value_2, grp),
SUM(value_1) OVER(PARTITION BY grp) sum_value_1,
SUM(value_2) OVER(PARTITION BY grp) sum_value_2
FROM (
SELECT *, REGEXP_REPLACE(TO_JSON_STRING(t), r'"(?:value_1|value_2)":.+?[,}]', '') grp
FROM `project.dataset.table` t
)
You can test, play with above using dummy data as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 value_1, 2 value_2, 3 value_3, 4 value_4 UNION ALL
SELECT 11, 12, 3, 14 UNION ALL
SELECT 21, 22, 3, 14
)
SELECT DISTINCT * EXCEPT(value_1, value_2, grp),
SUM(value_1) OVER(PARTITION BY grp) sum_value_1,
SUM(value_2) OVER(PARTITION BY grp) sum_value_2
FROM (
SELECT *, REGEXP_REPLACE(TO_JSON_STRING(t), r'"(?:value_1|value_2)":.+?[,}]', '') grp
FROM `project.dataset.table` t
)
with result as
Row value_3 value_4 sum_value_1 sum_value_2
1 3 14 32 34
2 3 4 1 2
Above will work with any number of columns and you don't need to reference them all explicitly - only those columns to be excluded to be explicitly referenced - value_1 and value_2 in this example
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With