I create a table in Athena with below structure
CREATE EXTERNAL TABLE s3_json_objects (
devId string,
type string,
status string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
LOCATION 's3://mybucket/folder1/data/athena_test/';
S3 bucket objects contains JSON structure like this
{ "devId": "00abcdef1122334401", "type": "lora", "status": "huihuhukiyg" }
However below SQL working correctly and return the correct result for only count
SELECT count(*) as total_s3_objects FROM "athena_db"."s3_json_objects"
BUT whenever I query below SQL select statement to fetch the JSON values from S3, It's returns result sets with empty values for columns
SELECT devid FROM "athena_db"."s3_json_objects" SELECT json_extract(devid , '$.devid') as Id FROM "athena_db"."s3_json_objects" SELECT * FROM "athena_db"."s3_json_objects"
Also, I review these links before post this question on StackOverflow and AWS Athena doc
Can't read json file via Amazon Athena
AWS Athena json_extract query from string field returns empty values
Any comments or suggestions would be much appreciated.
The JSON must be in a single line, as mentioned in this page of the AWS Athena documentation. You can have multiple JSON objects on separate lines, but each complete object must only span one line.
Example (this could all be in one S3 object):
{"devId": "a1", "type": "b1", "status": "c1"}
{"devId": "a2", "type": "b2", "status": "c2"}
Glue can read multi-line json objects because it has spark engine under the hood. One workaround is, transform those json objects to parquet using glue if you can't easily make those json objects on line.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With