Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to query arrays in bigquery?

schema in bigquery field: items type: string

Value in table in items field is stored as string {"data": [{"id": "1234", "plan": {"sub_id": "567", "metadata": {"currentlySelling": "true", "custom_attributes": "{\"shipping\": true,\"productLimit\":10}", "Features": "[\"10 products\", \"Online support\"]"}, "name": "Personal", "object": "plan"}, "quantity": 1}], "has_more": false}

Two Questions 1) How can i query within array eg: where shipping is true or where one of the features is "Online support" 2) The reason I had to store the data as string because "custom_attributes" value can change. Is there a better way to store data in bigquery when value of one of the nested key can change?

like image 432
BI Architect Avatar asked Jul 18 '17 18:07

BI Architect


People also ask

How do I sort an array in BigQuery?

Results from your query can be sorted by using the order_by argument. The argument can be used to sort nested objects too. The sort order (ascending vs. descending) is set by specifying the asc or desc enum value for the column name in the order_by input object, e.g. {name: desc} .

Can we use array in SQL query?

The ARRAY function returns an ARRAY with one element for each row in a subquery. If subquery produces a SQL table, the table must have exactly one column. Each element in the output ARRAY is the value of the single column of a row in the table.

How do I get a list of tables in BigQuery?

tables = client. list_tables(dataset_id) # Make an API request. Before trying this sample, follow the Ruby setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Ruby API reference documentation.

How do I Unnest multiple arrays in BigQuery?

Can We Unnest Multiple Arrays? When we use the UNNEST function on a column in BigQuery, all the rows under that column is flattened all at once. Currently, the UNNEST function does not accept multiple arrays as parameters. We need to merge the arrays into a single array before flattening it.


1 Answers

Your query would be something like this:

#standardSQL
SELECT game
FROM YourTable
WHERE EXISTS (SELECT 1 FROM UNNEST(participant) WHERE name = 'sam');

This returns all of the games where 'sam' was a participant. Here is a self-contained example:

#standardSQL
WITH YourTable AS (
  SELECT 'A' AS game, ARRAY<STRUCT<name STRING, age INT64>>[('sam', 12), ('tony', 12), ('julia', 12)] AS participant UNION ALL
  SELECT 'B', ARRAY<STRUCT<name STRING, age INT64>>[('sam', 12), ('max', 12), ('jacob', 12)] UNION ALL
  SELECT 'C', ARRAY<STRUCT<name STRING, age INT64>>[('sam', 12), ('max', 12), ('julia', 12)]
)
SELECT game
FROM YourTable
WHERE EXISTS (SELECT 1 FROM UNNEST(participant) WHERE name = 'sam');

If you wanted to pivot the data to have a column for each participant, you could use a query like this:

#standardSQL
CREATE TEMP FUNCTION WasParticipant(
    p_name STRING, participant ARRAY<STRUCT<name STRING, age INT64>>) AS (
  EXISTS(SELECT 1 FROM UNNEST(participant) WHERE name = p_name)
);

WITH YourTable AS (
  SELECT 'A' AS game, ARRAY<STRUCT<name STRING, age INT64>>[('sam', 12), ('tony', 12), ('julia', 12)] AS participant UNION ALL
  SELECT 'B', ARRAY<STRUCT<name STRING, age INT64>>[('sam', 12), ('max', 12), ('jacob', 12)] UNION ALL
  SELECT 'C', ARRAY<STRUCT<name STRING, age INT64>>[('sam', 12), ('max', 12), ('julia', 12)]
)
SELECT
  ARRAY_AGG(IF(WasParticipant('sam', participant), game, NULL) IGNORE NULLS) AS sams_games,
  ARRAY_AGG(IF(WasParticipant('tony', participant), game, NULL) IGNORE NULLS) AS tonys_games,
  ARRAY_AGG(IF(WasParticipant('julia', participant), game, NULL) IGNORE NULLS) AS julias_games,
  ARRAY_AGG(IF(WasParticipant('max', participant), game, NULL) IGNORE NULLS) AS maxs_games
FROM YourTable;

This returns an array with games played for each participant.

like image 112
Elliott Brossard Avatar answered Nov 05 '22 18:11

Elliott Brossard