Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS Athena csv metadata delimiter changed after first query use

I want to query s3 csv files to athena. Source csv file desc: (separator '|')

system information
val1|val2|val3|val4|val5|   

Base on that I create table in athena:

Create external table dbname.fromcsv
(
col1 string,
col2 string,
col3 string,
col4 string,
col5 string
)
 ROW FORMAT SERDE  'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
  "separatorChar" = '|')
LOCATION
's3://mybucketloc/folder/'
TBLPROPERTIES ("skip.header.line.count"='1');

After creating table and querying athena everything looks fine, data are well separated etc.

First query result:

col1 col2  col3   col4 col5
val1 val2  val3   val4 val5

Next query against the same table is returning values with ',' separator stored in one column...

col1                     col2  col3   col4  col5
val1,val1,val,val4,val5   

And then In my s3 bucket additional files are added .csv.metadata

How to handle such case? I dont want to delete every single time .csv.metadata file ? Is there are way to keep the original create table definition? I've tried to change create table to ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' and many others...

like image 864
dzabba Avatar asked Sep 05 '25 03:09

dzabba


1 Answers

I believe your s3://mybucketloc/folder/ is the same as athena query results bucket. This bucket is created by Athena for storing files such as .csv.metadata etc. and is meant for its internal use, hence you have .csv.metadata files present in the same bucket as that of your csv files.

Easiest solution could be, have your csv files in a different s3 bucket and not in the athena-query-results bucket.

like image 59
Nick Avatar answered Sep 07 '25 20:09

Nick