Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bigquery - json_extract all elements from an array

i'm trying to extract two key from every json in an arry of jsons(using sql legacy) currently i am using json extract function :

json_extract(json_column , '$[1].X') AS X,
json_extract(json_column , '$[1].Y') AS Y,

how can i make it run on every json at the 'json arry column', and not just [1] (for example)?

An example json:

[

{"blabla":000,"X":1,"blabla":000,"blabla":000,"blabla":000,,"Y":"2"},

{"blabla":000,"X":3,"blabla":000,"blabla":000,"blabla":000,,"Y":"4"},

]   

thanks in advance!

like image 504
am_am Avatar asked Aug 31 '18 17:08

am_am


People also ask

How do you Unnest an array in BigQuery?

To convert an ARRAY into a set of rows, also known as "flattening," use the UNNEST operator. UNNEST takes an ARRAY and returns a table with a single row for each element in the ARRAY . Because UNNEST destroys the order of the ARRAY elements, you may wish to restore order to the table.

How do you Unnest extract nested JSON data in BigQuery?

With Holistics's modeling layer, you can let your end-user have access to data in nested JSON arrays by: Write a SQL model to unnest repeated columns in BigQuery into a flat table. Set a relationship between this derived SQL model with the base model. Add the derived SQL model in a dataset to expose it to your end user.

What is Json_extract_scalar?

JSON_EXTRACT_SCALAR. Extracts a scalar value. A scalar value can represent a string, number, or boolean. Removes the outermost quotes and unescapes the values. Returns a SQL NULL if a non-scalar value is selected.

Can BigQuery read JSON?

BigQuery supports the JSON type even if schema information is not known at the time of ingestion. A field that is declared as JSON type is loaded with the raw JSON values.


2 Answers

Update 2020: JSON_EXTRACT_ARRAY()

Now BigQuery supports JSON_EXTRACT_ARRAY():

  • https://cloud.google.com/bigquery/docs/reference/standard-sql/json_functions#json_extract_array

For example, to solve this particular question:

SELECT id
  , ARRAY(
      SELECT JSON_EXTRACT_SCALAR(x, '$.author.email') 
      FROM UNNEST(JSON_EXTRACT_ARRAY(payload, "$.commits"))x
  ) emails
FROM `githubarchive.day.20180830` 
WHERE type='PushEvent' 
AND id='8188163772'

enter image description here


Previous answer

Let's start with a similar problem - this is not a very convenient way to extract all emails from a json array:

SELECT id
  , [ JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[0].author.email')  
      , JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[1].author.email')  
      , JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[2].author.email')  
      , JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[3].author.email')
    ] emails
FROM `githubarchive.day.20180830` 
WHERE type='PushEvent' 
AND id='8188163772'

enter image description here

The best way we have right now to deal with this is to use some JavaScript in an UDF to split a json-array into a SQL array:

CREATE TEMP FUNCTION json2array(json STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
  return JSON.parse(json).map(x=>JSON.stringify(x));
"""; 

SELECT * EXCEPT(array_commits),
  ARRAY(SELECT JSON_EXTRACT_SCALAR(x, '$.author.email') FROM UNNEST(array_commits) x) emails
FROM (
  SELECT id
    , json2array(JSON_EXTRACT(payload, '$.commits')) array_commits
  FROM `githubarchive.day.20180830` 
  WHERE type='PushEvent' 
  AND id='8188163772'
)

enter image description here

like image 131
Felipe Hoffa Avatar answered Sep 17 '22 18:09

Felipe Hoffa


May 1st, 2020 Update

A new function, JSON_EXTRACT_ARRAY, has been just added to the list of JSON functions. This function allows you to extract the contents of a JSON document as a string array.

so in below you can replace use of CUSTOM_JSON_EXTRACT UDF with just in-built function JSON_EXTRACT_ARRAY as in below example

#standardSQL
SELECT 
  JSON_EXTRACT_SCALAR(json , '$.X') AS X,
  JSON_EXTRACT_SCALAR(json , '$.Y') AS Y
FROM t, UNNEST(JSON_EXTRACT_ARRAY(json_column , '$')) json   

==============

Below example for BigQuery Standard SQL and allows you to be close to standard way of working with JSONPath and no extra manipulation needed so you just simply use CUSTOM_JSON_EXTRACT(json, json_path) function

#standardSQL
CREATE TEMPORARY FUNCTION CUSTOM_JSON_EXTRACT(json STRING, json_path STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
        return jsonPath(JSON.parse(json), json_path);
"""
OPTIONS (
    library="gs://your_bucket/jsonpath-0.8.0.js"
);
WITH t AS (
SELECT '''
[
{"blabla1":1,"X":1,"blabla2":3,"blabla3":5,"blabla4":7,"Y":"2"},
{"blabla1":2,"X":3,"blabla2":4,"blabla3":6,"blabla4":8,"Y":"4"}
]   
''' AS json_column 
)
SELECT 
  CUSTOM_JSON_EXTRACT(json_column , '$[*].X') AS X,
  CUSTOM_JSON_EXTRACT(json_column , '$[*].Y') AS Y
FROM t   

result will be

Row X   Y    
1   1   2    
    3   4      

Note: to overcome current BigQuery's "limitation" for JsonPath, above solution uses custom function along with external library - jsonpath-0.8.0.js that can be downloaded from https://code.google.com/archive/p/jsonpath/downloads and uploaded to Google Cloud Storage - gs://your_bucket/jsonpath-0.8.0.js

Just re-read Felipe's answer - for his example above solution will look like below (just as FYI)

SELECT 
  id, 
  CUSTOM_JSON_EXTRACT(payload, '$.commits[*].author.email') emails
FROM `githubarchive.day.20180830` 
WHERE type='PushEvent' 
AND id='8188163772'
like image 25
Mikhail Berlyant Avatar answered Sep 19 '22 18:09

Mikhail Berlyant