Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon athena can't read S3 JSON Object files and Athena select query returns empty result sets for JSON key columns

I create a table in Athena with below structure

CREATE EXTERNAL TABLE s3_json_objects (
    devId string,
    type string,
    status string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
LOCATION 's3://mybucket/folder1/data/athena_test/';

S3 bucket objects contains JSON structure like this

{
    "devId": "00abcdef1122334401",
    "type": "lora",
    "status": "huihuhukiyg"
}

However below SQL working correctly and return the correct result for only count

SELECT count(*) as total_s3_objects FROM "athena_db"."s3_json_objects"

BUT whenever I query below SQL select statement to fetch the JSON values from S3, It's returns result sets with empty values for columns

SELECT devid FROM "athena_db"."s3_json_objects"
SELECT json_extract(devid , '$.devid') as Id FROM "athena_db"."s3_json_objects"
SELECT * FROM "athena_db"."s3_json_objects"

enter image description here

Also, I review these links before post this question on StackOverflow and AWS Athena doc

Can't read json file via Amazon Athena

AWS Athena json_extract query from string field returns empty values

Any comments or suggestions would be much appreciated.

like image 858
SamDev Avatar asked Oct 06 '18 19:10

SamDev


2 Answers

The JSON must be in a single line, as mentioned in this page of the AWS Athena documentation. You can have multiple JSON objects on separate lines, but each complete object must only span one line.

Example (this could all be in one S3 object):

{"devId": "a1", "type": "b1", "status": "c1"}
{"devId": "a2", "type": "b2", "status": "c2"}
like image 60
Tom Avatar answered Oct 05 '22 13:10

Tom


Glue can read multi-line json objects because it has spark engine under the hood. One workaround is, transform those json objects to parquet using glue if you can't easily make those json objects on line.

like image 20
Tanveer Uddin Avatar answered Oct 05 '22 11:10

Tanveer Uddin