I am extracting JSON data out of a BigQuery column using JSON_EXTRACT
. Now I want to extract lists of values and run aggregate functions (like AVG
) against them. Testing the JsonPath expression .objects[*].v
succeeds on http://jsonpath.curiousconcept.com/. But the query:
SELECT
JSON_EXTRACT(json_column, "$.id") as id,
AVG(JSON_EXTRACT(json_column, "$.objects[*].v")) as average_value
FROM [tablename]
throws a JsonPath parse error on BigQuery. Is this possible on BigQuery? Or do I need to preprocess my data in order to run aggregate functions against data inside of my JSON?
My data looks similar to this:
# Record 1
{
"id": "abc",
"objects": [
{
"id": 1,
"v": 1
},
{
"id": 2,
"v": 3
}
]
}
# Record 2
{
"id": "def",
"objects": [
{
"id": 1,
"v": 2
},
{
"id": 2,
"v": 5
}
]
}
This is related to another question.
Update: The problem can be simplified by running two queries. First, run JSON_EXTRACT
and save the results into a view. Secondly, run the aggregate function against this view. But even then I need to correct the JsonPath expression $.objects[*].v
to prevent the JSONPath parse error
.
With Holistics's modeling layer, you can let your end-user have access to data in nested JSON arrays by: Write a SQL model to unnest repeated columns in BigQuery into a flat table. Set a relationship between this derived SQL model with the base model. Add the derived SQL model in a dataset to expose it to your end user.
BigQuery natively supports JSON data using the JSON data type. This document describes how to create a table with a JSON column, insert JSON data into a BigQuery table, and query JSON data.
ARRAY_AGG. Returns an ARRAY of expression values. To learn more about the optional arguments in this function and how to use them, see Aggregate function calls. To learn more about the OVER clause and how to use it, see Window function calls.
Leverage SPLIT() to pivot repeatable fields into separate rows. Also might be easier/cleaner to put this into a subquery and put AVG outside:
SELECT id, AVG(v) as average
FROM (
SELECT
JSON_EXTRACT(json_column, "$.id") as id,
INTEGER(
REGEXP_EXTRACT(
SPLIT(
JSON_EXTRACT(json_column, "$.objects")
,"},{"
)
,r'\"v\"\:([^,]+),')) as v FROM [mytable]
)
GROUP BY id;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With