I have an external table in hive
CREATE EXTERNAL TABLE FOO (
TS string,
customerId string,
products array< struct <productCategory:string, productId:string> >
)
PARTITIONED BY (ds string)
ROW FORMAT SERDE 'some.serde'
WITH SERDEPROPERTIES ('error.ignore'='true')
LOCATION 'some_locations'
;
A record of the table may hold data such as:
1340321132000, 'some_company', [{"productCategory":"footwear","productId":"nik3756"},{"productCategory":"eyewear","productId":"oak2449"}]
Do anyone know if there is a way to simply extract all the productCategory from this record and return it as an array of productCategories without using explode. Something like the following:
["footwear", "eyewear"]
Or do I need to write my own GenericUDF, if so, I do not know much Java (a Ruby person), can someone give me some hints? I read some instructions on UDF from Apache Hive. However, I do not know which collection type is best to handle array, and what collection type to handle structs?
===
I have somewhat answered this question by writing a GenericUDF, but I ran into 2 other problems. It is in this SO Question
You can use json serde or build-in functions get_json_object, json_tuple.
With rcongiu's Hive-JSON SerDe the usage will be:
define table:
CREATE TABLE complex_json (
DocId string,
Orders array<struct<ItemId:int, OrderDate:string>>)
load sample json into it (it is important for this data to be one-lined):
{"DocId":"ABC","Orders":[{"ItemId":1111,"OrderDate":"11/11/2012"},{"ItemId":2222,"OrderDate":"12/12/2012"}]}
Then fetching orders ids is as easy as:
SELECT Orders.ItemId FROM complex_json LIMIT 100;
It will return the list of ids for you:
itemid [1111,2222]
Proven to return correct results on my environment. Full listing:
add jar hdfs:///tmp/json-serde-1.3.6.jar;
CREATE TABLE complex_json (
DocId string,
Orders array<struct<ItemId:int, OrderDate:string>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';
LOAD DATA INPATH '/tmp/test.json' OVERWRITE INTO TABLE complex_json;
SELECT Orders.ItemId FROM complex_json LIMIT 100;
Read more here:
http://thornydev.blogspot.com/2013/07/querying-json-records-via-hive.html
One way would be to use either the inline
or explode
functions, like so:
SELECT
TS,
customerId,
pCat,
pId,
FROM FOO
LATERAL VIEW inline(products) p AS pCat, pId
Otherwise you can write UDF. Check out this post and this post for that. Along with the following resources:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With