BigQuery has facilities to parse JSON in real-time interactive queries: Just store the JSON encoded object as a string, and query in real time, with functions like JSON_EXTRACT_SCALAR.
However, I can't find a way to discover all the keys (properties) in these objects.
Can I use a UDF for this?
To extract the name and projects properties from the JSON string, use the json_extract function as in the following example. The json_extract function takes the column containing the JSON string, and searches it using a JSONPath -like expression with the dot . notation. JSONPath performs a simple tree traversal.
With Holistics's modeling layer, you can let your end-user have access to data in nested JSON arrays by: Write a SQL model to unnest repeated columns in BigQuery into a flat table. Set a relationship between this derived SQL model with the base model. Add the derived SQL model in a dataset to expose it to your end user.
Accessing nested json objects is just like accessing nested arrays. Nested objects are the objects that are inside an another object. In the following example 'vehicles' is a object which is inside a main object called 'person'. Using dot notation the nested objects' property(car) is accessed.
BigQuery natively supports JSON data using the JSON data type.
Here's something that uses Standard SQL:
CREATE TEMP FUNCTION jsonObjectKeys(input STRING)
RETURNS Array<String>
LANGUAGE js AS """
return Object.keys(JSON.parse(input));
""";
WITH keys AS (
SELECT
jsonObjectKeys(myColumn) AS keys
FROM
myProject.myTable
WHERE myColumn IS NOT NULL
)
SELECT
DISTINCT k
FROM keys
CROSS JOIN UNNEST(keys.keys) AS k
ORDER BY k
Below version fixes some "issues" in original answer like:
1. only first level of keys was emitted
2. having to manually comppile and than run final query for extracting info based on discovered keys
SELECT type, key, value, COUNT(1) AS weight
FROM JS(
(SELECT json, type
FROM [fh-bigquery:openlibrary.ol_dump_20151231@0]
WHERE type = '/type/edition'
),
json, type, // Input columns
"[{name: 'type', type:'string'}, // Output schema
{name: 'key', type:'string'},
{name: 'value', type:'string'}]",
"function(r, emit) { // The function
x = JSON.parse(r.json);
processKey(x, '');
function processKey(node, parent) {
if (parent !== '') {parent += '.'};
Object.keys(node).map(function(key) {
value = node[key].toString();
if (value !== '[object Object]') {
emit({type:r.type, key:parent + key, value:value});
} else {
processKey(node[key], parent + key);
};
});
};
}"
)
GROUP EACH BY type, key, value
ORDER BY weight DESC
LIMIT 1000
The result is as below
Row type key value weight
1 /type/edition type.key /type/edition 25140209
2 /type/edition last_modified.type /type/datetime 25140209
3 /type/edition created.type /type/datetime 17092292
4 /type/edition languages.0.key /languages/eng 14514830
5 /type/edition notes.type /type/text 11681480
6 /type/edition revision 2 8714084
7 /type/edition latest_revision 2 8704217
8 /type/edition revision 3 5041680
9 /type/edition latest_revision 3 5040634
10 /type/edition created.value 2008-04-01T03:28:50.625462 3579095
11 /type/edition revision 1 3396868
12 /type/edition physical_format Paperback 3181270
13 /type/edition revision 4 3053266
14 /type/edition latest_revision 4 3053197
15 /type/edition revision 5 2076094
16 /type/edition latest_revision 5 2076072
17 /type/edition publish_country nyu 1727347
18 /type/edition created.value 2008-04-30T09:38:13.731961 1681227
19 /type/edition publish_country enk 1627969
20 /type/edition publish_places London 1613755
21 /type/edition physical_format Hardcover 1495864
22 /type/edition publish_places New York 1467779
23 /type/edition revision 6 1437467
24 /type/edition latest_revision 6 1437463
25 /type/edition publish_country xxk 1407624
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With