I am trying to convert a csv file from s3 into a table in Athena. When I run the query on Athena console it works but when I run it on Sagemaker Jupyter notebook with boto3 client it returns:
"**InvalidRequestException**: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 1:8: no viable alternative at input 'CREATE EXTERNAL'"
Here is my code
def run_query(query):
client = boto3.client('athena')
response = client.start_query_execution(
QueryString=query,
ResultConfiguration={
'OutputLocation': 's3://path/to/s3output',
}
)
print('Execution ID: ' + response['QueryExecutionId'])
return response
createTable = \
"""CREATE EXTERNAL TABLE TestTable (
ID string,
CustomerId string,
Ip string,
MessageFilename string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar' = ',',
'quoteChar' = '\"',
'escapeChar' = '\\'
)
STORED AS TEXTFILE
LOCATION 's3://bucket_name/results/csv/'
TBLPROPERTIES ("skip.header.line.count"="1")"""
response = run_query(createTable, s3_output)
print(response)
I have run queries through boto3 client in json format (so, using ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe') which have worked well but somehow this doesn't. I have tried changing names, syntax, quotes but that doesn't seem to work.
Any suggestion would be very appreciated, Thank you!
Thanks for sharing complete example. The issue is with the escaping in SERDEPROPERTIES. After modifying createTable as below it works
createTable = \
"""CREATE EXTERNAL TABLE testtable (
`id` string,
`customerid` string,
`ip` string,
`messagefilename` string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar' = ',',
'quoteChar' = '\\\"',
'escapeChar' = '\\\\' )
STORED AS TEXTFILE
LOCATION 's3://bucket_name/results/csv/'
TBLPROPERTIES ("skip.header.line.count"="1");"""
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With