I have this data stored in S3 as .csv (but it can be any other file format which is the best suitable for my requirement):
"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
and I would like to have one coordinate per column:
Like this:
Coordinates:
1. 41.9100687,-87.8805614
2. 41.9802511,-87.8803253
3. 41.9806802,-87.8792417
After importing S3 I choose CSV as data type... And then I add string column.
But instead I get some weird table output. Beside this I tried to import this as plain txt file with comma delimiter.. I get same weird output.
What am I doing wrong here?
EDIT
This test
column screenshot is query from another but identical example. There should be gps_coordinates
Athena supports CSV output files only. The UNLOAD query writes query results from a SELECT statement to the specified data format. Supported formats for UNLOAD include Apache Parquet, ORC, Apache Avro, and JSON.
Q: What data formats does Amazon Athena support? Amazon Athena supports a wide variety of data formats like CSV, TSV, JSON, or Textfiles and also supports open source columnar formats such as Apache ORC and Apache Parquet. Athena also supports compressed data in Snappy, Zlib, LZO, and GZIP formats.
But unlike Apache Drill, Athena is limited to data only from Amazon’s own S3 storage service. However, Athena is able to query a variety of file formats, including, but not limited to CSV, Parquet, JSON, etc. In this pos t, we’ll see how we can setup a table in Athena using a sample data set stored in S3 as a .csv file.
It is a good representation of a typical CSV file data you might have. I uploaded this pet.csv into an S3 location: S3://test-athena-linh/pet/, then headed to AWS Athena to create the table. (NOTE: run any required Drop table statement before running create statement).
However, Athena is able to query a variety of file formats, including, but not limited to CSV, Parquet, JSON, etc. In this pos t, we’ll see how we can setup a table in Athena using a sample data set stored in S3 as a.csv file. But for this, we first need that sample CSV file. You can download it here.
As you can see from the screenshot, you have multiple options to create a table. For this post, we’ll stick with the basics and select the “Create table from S3 bucket data” option.So, now that you have the file in S3, open up Amazon Athena. You’ll get an option to create a table on the Athena home page.
To reproduce your situation, I did the following:
gps.txt
)s3://my-bucket/gps/
)However, since the data has commas within each pair of numbers, I changed the SerDe to OpenCSVSerDe for Processing CSV - Amazon Athena:
CREATE EXTERNAL TABLE IF NOT EXISTS default.gps (
`c1` string,
`c2` string,
`c3` string,
`c4` string,
`c5` string,
`c6` string,
`c7` string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ("separatorChar" = ",", "escapeChar" = "\\")
LOCATION 's3://my-bucket/gps/'
TBLPROPERTIES ('has_encrypted_data'='false');
I was then able to successfully query the table. A sample column value is: 41.9100687,-87.8805614
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With