I store the following JSON objects in a Hive table:
{
"main_id": "qwert",
"features": [
{
"scope": "scope1",
"name": "foo",
"value": "ab12345",
"age": 50,
"somelist": ["abcde","fghij"]
},
{
"scope": "scope2",
"name": "bar",
"value": "cd67890"
},
{
"scope": "scope3",
"name": "baz",
"value": [
"A",
"B",
"C"
]
}
]
}
"features" is an array of varying length, i.e. all objects are optional. The objects have arbitrary elements, but all of them contain "scope", "name" and "value".
This is the Hive table I created:
CREATE TABLE tbl(
main_id STRING,features array<struct<scope:STRING,name:STRING,value:array<STRING>,age:INT,somelist:array<STRING>>>
)
I need a Hive query that returns the main_id and the value of the struct with the name "baz", i.e.,
main_id baz_value
qwert ["A","B","C"]
My problem is that the Hive UDF "get_json_object" supports only a limited version of JSONPath. It does not support a path like get_json_object(features, '$.features[?(@.name='baz')]')
.
How can query the wanted result with Hive? Is it maybe easier with another Hive table structure?
I found a solution for this:
Use the Hive explode UDTF to explode the struct array, i.e., create a second (temporary) table with one record for each struct in the array "features".
CREATE TABLE tbl_exploded as
select main_id,
f.name as f_name,
f.value as f_value
from tbl
LATERAL VIEW explode(features) exploded_table as f
-- optionally filter here instead of in 2nd query:
-- where f.name = 'baz';
The result of this is:
qwert, foo, ab12345
qwert, bar, cd67890
qwert, baz, ["A","B","C"]
Now you can select the main_id and value like this:
select main_id, f_value from tbl_exploded where f_name = 'baz';
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With