Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting HIVE_CURSOR_ERROR while querying the table created in Amazon Athena

I am getting the below error while querying the table created in Amazon Athena.

Error

HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: A JSONObject text must end with '}' at 2

The sample file which I am using and the query to create table is below. The table gets successfully created by the below query but when I am fetching the result from table I am getting the error. Please provide your valuable suggestion.

Note Sample Data Create table

like image 314
abc Avatar asked Jun 20 '17 07:06

abc


2 Answers

AWS Athena does not support multi-line JSON.

Athena knowledge center

Make sure your JSON record is on a single line

Athena doesn't currently support multi-line JSON records.

like image 143
jens walter Avatar answered Sep 23 '22 03:09

jens walter


I'm abusing the answer field to have more space and to have my thoughts on this a little bit structured. I hope this is useful input to anyone using Athena.

I'm using Athena to create two tables. Single-line-json-based and multi-line-json-based reports in two separate bucket folders and two according tables.

Single-line reports in JSON are straightforward. For table creation I use this driver:

ROW FORMAT serde 'org.apache.hive.hcatalog.data.JsonSerDe'

However if you use this exact one to consume multi-line you will get an error PLUS there are three more things (I, II and III) to be aware of. First of all the different driver:

ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'

then

I the (invalid) JSON format. My multi-line files look like this:

{"key": "value"}
{"key": "value"}

Notice there is one JSON object per line and objects are NOT comma separated! You can read about this here:

Store multiple elements in json files in AWS Athena

II Create Statement

Contrary to your screenshot for table creation @abc I used it without nested structs:

CREATE EXTERNAL TABLE IF NOT EXISTS internal_reports (
`Date` Date,
impressions INT,
reach INT,
follower INT,
...
)

III Date property

I came across this whole driver-experimenting thing because the Date property caused issues in my single-line table because my values for Date had time information not equal to midnight. For the other multi-line reports my Dates were ok (the time was exactly midnight). You can read about this here:

https://developer.ibm.com/answers/questions/177238/timestamp-format-must-be-yyyy-mm-dd-hhmmssffffffff/ Athena - DATE column correct values from JSON

Cheers

like image 25
p0rter Avatar answered Sep 22 '22 03:09

p0rter