Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I apply aggregate functions to data extracted from JSON in Google BigQuery?

I am extracting JSON data out of a BigQuery column using JSON_EXTRACT. Now I want to extract lists of values and run aggregate functions (like AVG) against them. Testing the JsonPath expression .objects[*].v succeeds on http://jsonpath.curiousconcept.com/. But the query:

SELECT
  JSON_EXTRACT(json_column, "$.id") as id,
  AVG(JSON_EXTRACT(json_column, "$.objects[*].v")) as average_value
FROM [tablename]

throws a JsonPath parse error on BigQuery. Is this possible on BigQuery? Or do I need to preprocess my data in order to run aggregate functions against data inside of my JSON?

My data looks similar to this:

# Record 1
{
  "id": "abc",
  "objects": [
    {
      "id": 1,
      "v": 1
    },
    {
      "id": 2,
      "v": 3
    }
  ]
}
# Record 2
{
  "id": "def",
  "objects": [
    {
      "id": 1,
      "v": 2
    },
    {
      "id": 2,
      "v": 5
    }
  ]
}

This is related to another question.

Update: The problem can be simplified by running two queries. First, run JSON_EXTRACT and save the results into a view. Secondly, run the aggregate function against this view. But even then I need to correct the JsonPath expression $.objects[*].v to prevent the JSONPath parse error.

like image 305
Ingo Avatar asked Oct 28 '14 18:10

Ingo


People also ask

How do you Unnest extract nested JSON data in BigQuery?

With Holistics's modeling layer, you can let your end-user have access to data in nested JSON arrays by: Write a SQL model to unnest repeated columns in BigQuery into a flat table. Set a relationship between this derived SQL model with the base model. Add the derived SQL model in a dataset to expose it to your end user.

Can BigQuery handle JSON?

BigQuery natively supports JSON data using the JSON data type. This document describes how to create a table with a JSON column, insert JSON data into a BigQuery table, and query JSON data.

What is array AGG in BigQuery?

ARRAY_AGG. Returns an ARRAY of expression values. To learn more about the optional arguments in this function and how to use them, see Aggregate function calls. To learn more about the OVER clause and how to use it, see Window function calls.


1 Answers

Leverage SPLIT() to pivot repeatable fields into separate rows. Also might be easier/cleaner to put this into a subquery and put AVG outside:

SELECT id, AVG(v) as average 
FROM (
SELECT 
    JSON_EXTRACT(json_column, "$.id") as id, 
    INTEGER( 
      REGEXP_EXTRACT(
        SPLIT(
          JSON_EXTRACT(json_column, "$.objects")
          ,"},{"
          )
        ,r'\"v\"\:([^,]+),')) as v FROM [mytable] 
)
GROUP BY id;
like image 199
vgt Avatar answered Sep 20 '22 03:09

vgt