I have a simple JSON file:
{'oldname':'mau'}
In AWS Athena I wish to read this file and I create the matching table t
CREATE EXTERNAL TABLE IF NOT EXISTS stats_json.t (
`oldname` string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1'
) LOCATION 's3://mybucket/stats/';
then I try to query:
select * from t limit 10;
and get an error:
Query bceb274d-309f-40d5-a893-570de5f4ca4e failed with error code HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: Missing value at 1 [character 2 line 1]
Where do I go wrong?
I got it to work, so to answer my own question, the problem was with the format of the JSON file. It seems AWS Athena (well org.openx.data.jsonserde.JsonSerDe) is rather picky re: format of the JSON file it reads.
Each JSON record must be entirely on 1 line of text with no spaces between keys and values.
In Python, I generated the JSON records as follows:
import json
dStatsRecord = {} # a valid json dict
with open('myfile.json', 'r') as oFile:
json.dump(dStatsRecord, oFile, separators=(',', ':'))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With