Error: Scalar subquery produced more than one element

Tags:

google-bigquery

We just started migrating our queries from Legacy to Standard SQL so we are learning on how to process nested data and arrays now.

Basically what we want to do is to retrieve from ga_sessions table the following data:

visitor id, session id, array of skus
visitor 1, session 1, [sku_0, sku_1, (...), sku_n]
visitor 1, session 2, [skus]

To do so we ran this simple query:

  WITH
  customers_data AS(
  SELECT
    fullvisitorid fv,
    visitid v,
    ARRAY_AGG((
      SELECT
        prods.productsku
      FROM
        UNNEST(hits.product) prods)) sku
  FROM
    `dataset_id.ga_sessions_*`,
    UNNEST(hits) hits
  WHERE
    1 = 1
    AND _table_suffix BETWEEN FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
    AND FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY))
    --and (select count(productsku) from unnest(hits.product) where productsku is not null) = 1
  GROUP BY
    fv,
    v
  LIMIT
    100 )
SELECT
  *
FROM
  customers_data

But we get this error:

Error: Scalar subquery produced more than one element

The data that comes from the hits field looks something like this:

enter image description here

So when we addded back the where clause:

and (select count(productsku) from unnest(hits.product) where productsku is not null) = 1

It does not give an error but the results have duplicated skus and we also lost the skus inside the bigger arrays.

Is there some mistake in our query preventing the arrays of being unnested?

652

asked Dec 03 '16 01:12

1 Answers

If I understand correctly, I think you want something like this:

WITH customers_data AS (
  SELECT
    fullvisitorid fv,
    visitid v,
    ARRAY_CONCAT_AGG(ARRAY(
      SELECT productsku FROM UNNEST(hits.product))) sku
  FROM
    `dataset_id.ga_sessions_*`,
    UNNEST(hits) hits
  WHERE
    _table_suffix BETWEEN
      FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
      AND FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY))
  GROUP BY
    fv,
    v
  LIMIT
    100
)
SELECT
  *
FROM
  customers_data;

This preserves all of the SKUs through the use of ARRAY_CONCAT_AGG over an ARRAY subquery that extracts the SKUs for each row. If you want to deduplicate all of the SKUs across rows, you can replace

SELECT
  *
FROM
  customers_data;

with:

SELECT *
  REPLACE (ARRAY(SELECT DISTINCT s FROM UNNEST(sku) AS s) AS sku)
FROM
  customers_data;

Edit: For more reading, take a look at types of expression subqueries in the documentation. In your case, you needed an ARRAY subquery, since the idea was to take an ARRAY<STRUCT<...>> in each row and transform it into an ARRAY of the field type in order to concatenate the arrays across rows.

ARRAY_AGG creates an array from individual elements, whereas ARRAY_CONCAT_AGG creates an array from the concatenation of arrays. The difference between them is similar to the difference between the array literal constructor [] and ARRAY_CONCAT, except that the _AGG versions are aggregate functions.

As a standalone example, you can try:

WITH T AS (
  SELECT ARRAY<STRUCT<x INT64, y INT64>>[(1, 10), (2, 11), (3, 12)] AS arr UNION ALL
  SELECT ARRAY<STRUCT<x INT64, y INT64>>[(4, 13)] UNION ALL
  SELECT ARRAY<STRUCT<x INT64, y INT64>>[(5, 14), (6, 15)]
)
SELECT ARRAY(SELECT x FROM UNNEST(arr)) AS x_array
FROM T;

This returns a column x_array where the elements in each array are those of the x field from each element in arr. To concatenate all of the arrays so that there is a single row in the result, use ARRAY_CONCAT_AGG, e.g.:

WITH T AS (
  SELECT ARRAY<STRUCT<x INT64, y INT64>>[(1, 10), (2, 11), (3, 12)] AS arr UNION ALL
  SELECT ARRAY<STRUCT<x INT64, y INT64>>[(4, 13)] UNION ALL
  SELECT ARRAY<STRUCT<x INT64, y INT64>>[(5, 14), (6, 15)]
)
SELECT ARRAY_CONCAT_AGG(ARRAY(SELECT x FROM UNNEST(arr))) AS x_array
FROM T;

For your other question, REPLACE accepts a list of expressions paired with the columns that they are meant to replace. The expression can be something simple such as a literal, or it can be something more complicated such as an ARRAY subquery, which is what I used. For example:

WITH T AS (
  SELECT 1 AS x, 'foo' AS y, true AS z UNION ALL
  SELECT 2, 'bar', false UNION ALL
  SELECT 3, 'baz', true
)
SELECT * REPLACE(1 - x AS x, CAST(x AS STRING) AS y)
FROM T;

This replaces the original x and y columns that would have been returned from the SELECT * with the results of 1 - x and CAST(x AS STRING) instead.

answered Dec 05 '22 18:12

Elliott Brossard

Related questions
                            
                                SQL Syntax NOT IN for Google BigQuery
                            
                                Bigquery SQL for sliding window aggregate
                            
                                Rename datasets in BigQuery
                            
                                How do I create a partitioned table in bigquery
                            
                                Error Loading Large CSV into Google BigQuery
                            
                                Query Failed Error: Resources exceeded during query execution: The query could not be executed in the allotted memory
                            
                                BigQuery FIRST_VALUE and IGNORE_NULLS - why it works this way?
                            
                                How can I compute TF/IDF with SQL (BigQuery)
                            
                                BigQueryIO.read().fromQuery performance slow
                            
                                Dataflow/apache beam - how to access current filename when passing in pattern?
                            
                                'Missing close double quote (") character' is complained when there're line feeds in csv file when loading data to BigQuery
                            
                                BigQuery filter using LEFT OUTER JOIN or Anti-join
                            
                                Running asynchronous queries in BigQuery not noticeably faster
                            
                                BigQuery Could not parse 'null' as int for field
                            
                                BigQuery is not a constructor Error when connecting to Google BigQuery with Nodejs
                            
                                BigQuery : is it possible to iterate over an array?
                            
                                Number of columns limitation in BigQuery?
                            
                                BigQuery - combine tables
                            
                                Authorization for accessing BigQuery from R session on server
                            
                                First row for each group

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Error: Scalar subquery produced more than one element

Tags:

google-bigquery

Willian Fuks

People also ask

1 Answers

Elliott Brossard

Recent Activity

Donate For Us