Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery GroupBy with STRUCT

In BigQuery, I can successfully run the following query using standard SQL:

SELECT 
  COUNT(*) AS totalCount,
  city,
  DATE_TRUNC(timeInterval.intervalStart, YEAR) AS start
FROM 
  sandbox.CountByCity
GROUP BY 
    city, start

But it fails when I nest the start value in a STRUCT, like this...

SELECT 
  COUNT(*) AS totalCount,
  city,
  STRUCT(
    DATE_TRUNC(timeInterval.intervalStart, YEAR) AS start
  ) as timeSpan
FROM 
  sandbox.CountByCity
GROUP BY 
    city, timeSpan.start

In this case, I get the following error message:

Cannot GROUP BY field references from SELECT list alias timeSpan at [10:11]

What is the correct way to write the query so that the start value is nested within a STRUCT?

like image 411
Greg McFall Avatar asked Feb 05 '23 03:02

Greg McFall


2 Answers

You can do this using ANY_VALUE. The struct value that you get is well-defined, since the value is the same for the entire group:

SELECT 
  COUNT(*) AS totalCount,
  city,
  ANY_VALUE(STRUCT(
    DATE_TRUNC(timeInterval.intervalStart, YEAR) AS start
  )) as timeSpan
FROM 
  sandbox.CountByCity
GROUP BY 
    city, DATE_TRUNC(timeInterval.intervalStart, YEAR);

Here is an example using some sample data:

WITH `sandbox.CountByCity` AS (
  SELECT 'Seattle' AS city, STRUCT(DATE '2017-12-11' AS intervalStart) AS timeInterval UNION ALL
  SELECT 'Seattle', STRUCT(DATE '2016-11-10' AS intervalStart) UNION ALL
  SELECT 'Seattle', STRUCT(DATE '2017-03-24' AS intervalStart) UNION ALL
  SELECT 'Kirkland', STRUCT(DATE '2017-02-01' AS intervalStart)
)
SELECT 
  COUNT(*) AS totalCount,
  city,
  ANY_VALUE(STRUCT(
    DATE_TRUNC(timeInterval.intervalStart, YEAR) AS start
  )) as timeSpan
FROM 
  `sandbox.CountByCity`
GROUP BY 
    city, DATE_TRUNC(timeInterval.intervalStart, YEAR);

You could also consider submitting a feature request to enable GROUP BY with STRUCT types.

like image 181
Elliott Brossard Avatar answered Feb 15 '23 19:02

Elliott Brossard


Not sure why exactly you would wanted this - but believe it is for some reason - so try below (at least formally it does what you ask)

#standardSQL
SELECT
  totalCount, 
  city, 
  STRUCT(start) timeSpan
FROM (
  SELECT 
    COUNT(*) AS totalCount,
    city,
    DATE_TRUNC(timeInterval.intervalStart, YEAR) AS start
  FROM `sandbox.CountByCity`
  GROUP BY city, start
)
like image 35
Mikhail Berlyant Avatar answered Feb 15 '23 18:02

Mikhail Berlyant