We have loaded json blobs in a String field in a Bigquery table. I need to create a view (using standard sql)over the table that would extract the array field as a bigquery array/repeated field of "RECORD" type (which itself includes a repeated field).
Here is a sample record (json_blob):
{"order_id":"123456","customer_id":"2abcd", "items":[{"line":"1","ref_ids":["66b56e60","9e7ca2b7"],"sku":"1111","amount":40 },{"line":"2","ref_ids":["7777h0","8888j0"],"sku":"2222","amount":10 }]}
I am hoping to end up with a view that has the following layout:
[
{
"name": "order_id",
"type": "STRING",
"mode": "NULLABLE"
},
{
"mode": "NULLABLE",
"name": "customer_id",
"type": "STRING"
},
{
"mode": "REPEATED",
"name": "items",
"type": "RECORD",
"fields": [
{
"mode": "NULLABLE",
"name": "line",
"type": "STRING"
},
{
"mode": "REPEATED",
"name": "ref_ids",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "sku",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "amount",
"type": "INTEGER"
}
]
}
]
Json_extract(json_blob, '$.items') extracts the items parts, but how do I convert that to a bigquery array of type "RECORD" which then can be processed like normal bigquery array/repeated of STRUCT?
Appreciate any help.
With Holistics's modeling layer, you can let your end-user have access to data in nested JSON arrays by: Write a SQL model to unnest repeated columns in BigQuery into a flat table. Set a relationship between this derived SQL model with the base model. Add the derived SQL model in a dataset to expose it to your end user.
How to Query BigQuery Repeated Fields. To extract information from a repeated field in BigQuery, you must use a more exotic pattern. This is normally done using the UNNEST function, which converts an array of values in a table into rows. These can then be joined to the original table to be queried.
A RECORD column can have REPEATED mode, which is represented as an array of STRUCT types. Also, a field within a record can be repeated, which is represented as a STRUCT that contains an ARRAY . An array cannot contain another array directly. For more information, see Declaring an ARRAY type.
There is no way to do this using SQL functions in BigQuery at the time of this writing unless you can impose a hard limit on the number of values in the JSON array; see the relevant issue tracker item. Your options are:
Here is the approach using a JavaScript UDF:
#standardSQL
CREATE TEMP FUNCTION JsonToItems(input STRING)
RETURNS STRUCT<order_id INT64, customer_id STRING, items ARRAY<STRUCT<line STRING, ref_ids ARRAY<STRING>, sku STRING, amount INT64>>>
LANGUAGE js AS """
return JSON.parse(input);
""";
WITH Input AS (
SELECT '{"order_id":"123456","customer_id":"2abcd", "items":[{"line":"1","ref_ids":["66b56e60","9e7ca2b7"],"sku":"1111","amount":40 },{"line":"2","ref_ids":["7777h0","8888j0"],"sku":"2222","amount":10 }]}' AS json
)
SELECT
JsonToItems(json).*
FROM Input;
If you do want to try the SQL-based approach without JavaScript, here's somewhat of a hack until the feature request above is resolved, where the number of array elements must be no more than 10:
#standardSQL
CREATE TEMP FUNCTION JsonExtractRefIds(json STRING) AS (
(SELECT ARRAY_AGG(v IGNORE NULLS)
FROM UNNEST([
JSON_EXTRACT_SCALAR(json, '$.ref_ids[0]'),
JSON_EXTRACT_SCALAR(json, '$.ref_ids[1]'),
JSON_EXTRACT_SCALAR(json, '$.ref_ids[2]'),
JSON_EXTRACT_SCALAR(json, '$.ref_ids[3]'),
JSON_EXTRACT_SCALAR(json, '$.ref_ids[4]'),
JSON_EXTRACT_SCALAR(json, '$.ref_ids[5]'),
JSON_EXTRACT_SCALAR(json, '$.ref_ids[6]'),
JSON_EXTRACT_SCALAR(json, '$.ref_ids[7]'),
JSON_EXTRACT_SCALAR(json, '$.ref_ids[8]'),
JSON_EXTRACT_SCALAR(json, '$.ref_ids[9]')]) AS v)
);
CREATE TEMP FUNCTION JsonToItem(json STRING)
RETURNS STRUCT<line STRING, ref_ids ARRAY<STRING>, sku STRING, amount INT64>
AS (
IF(json IS NULL, NULL,
STRUCT(
JSON_EXTRACT_SCALAR(json, '$.line'),
JsonExtractRefIds(json),
JSON_EXTRACT_SCALAR(json, '$.sku'),
CAST(JSON_EXTRACT_SCALAR(json, '$.amount') AS INT64)
)
)
);
CREATE TEMP FUNCTION JsonToItems(json STRING) AS (
(SELECT AS STRUCT
CAST(JSON_EXTRACT_SCALAR(json, '$.order_id') AS INT64) AS order_id,
JSON_EXTRACT_SCALAR(json, '$.customer_id') AS customer_id,
(SELECT ARRAY_AGG(v IGNORE NULLS)
FROM UNNEST([
JsonToItem(JSON_EXTRACT(json, '$.items[0]')),
JsonToItem(JSON_EXTRACT(json, '$.items[1]')),
JsonToItem(JSON_EXTRACT(json, '$.items[2]')),
JsonToItem(JSON_EXTRACT(json, '$.items[3]')),
JsonToItem(JSON_EXTRACT(json, '$.items[4]')),
JsonToItem(JSON_EXTRACT(json, '$.items[5]')),
JsonToItem(JSON_EXTRACT(json, '$.items[6]')),
JsonToItem(JSON_EXTRACT(json, '$.items[7]')),
JsonToItem(JSON_EXTRACT(json, '$.items[8]')),
JsonToItem(JSON_EXTRACT(json, '$.items[9]'))]) AS v) AS items
)
);
WITH Input AS (
SELECT '{"order_id":"123456","customer_id":"2abcd", "items":[{"line":"1","ref_ids":["66b56e60","9e7ca2b7"],"sku":"1111","amount":40 },{"line":"2","ref_ids":["7777h0","8888j0"],"sku":"2222","amount":10 }]}' AS json
)
SELECT
JsonToItems(json).*
FROM Input;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With