Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS Athena flattened data from nested JSON source

I'd like to create a table from a nested JSON in Athena. The solutions described here using tools like hive Openx-JsonSerDe attempt to mirror the JSON data in the SQL statement. I just want to get a few fields from the JSON file and create the table. I can't seem to find any resources on how to do that.

E.g. JSON file {"records": [{"a": "data1", "b": "data2", "c": "data3"}]} The table I'd like to create just only has columns a and b

like image 791
ebnius Avatar asked Apr 24 '17 19:04

ebnius


1 Answers

I think what you are trying to achieve is unnesting the array to transform one array entry into one row.

This is possible through the correct querying of your data structure.

table definition:

CREATE external TABLE complex (
   records array<struct<a:string,b:string>>
   )
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://bucket/test1/';

query:

select record.a,record.b from complex 
cross join UNNEST(complex.records) as t1(record);
like image 93
jens walter Avatar answered Sep 17 '22 12:09

jens walter