Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS Athena Import CSV file

I have this data stored in S3 as .csv (but it can be any other file format which is the best suitable for my requirement):

"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",

and I would like to have one coordinate per column:

Like this:

Coordinates:

1.  41.9100687,-87.8805614

2.  41.9802511,-87.8803253

3.  41.9806802,-87.8792417

After importing S3 I choose CSV as data type... And then I add string column.

enter image description here

But instead I get some weird table output. Beside this I tried to import this as plain txt file with comma delimiter.. I get same weird output.

enter image description here

What am I doing wrong here?

EDIT

This test column screenshot is query from another but identical example. There should be gps_coordinates

like image 447
harunB10 Avatar asked Oct 20 '19 11:10

harunB10


People also ask

Does Athena support CSV?

Athena supports CSV output files only. The UNLOAD query writes query results from a SELECT statement to the specified data format. Supported formats for UNLOAD include Apache Parquet, ORC, Apache Avro, and JSON.

What data format does Amazon Athena support?

Q: What data formats does Amazon Athena support? Amazon Athena supports a wide variety of data formats like CSV, TSV, JSON, or Textfiles and also supports open source columnar formats such as Apache ORC and Apache Parquet. Athena also supports compressed data in Snappy, Zlib, LZO, and GZIP formats.

Can Athena query data from Amazon S3?

But unlike Apache Drill, Athena is limited to data only from Amazon’s own S3 storage service. However, Athena is able to query a variety of file formats, including, but not limited to CSV, Parquet, JSON, etc. In this pos t, we’ll see how we can setup a table in Athena using a sample data set stored in S3 as a .csv file.

How to create a table from a CSV file in Athena?

It is a good representation of a typical CSV file data you might have. I uploaded this pet.csv into an S3 location: S3://test-athena-linh/pet/, then headed to AWS Athena to create the table. (NOTE: run any required Drop table statement before running create statement).

What file formats can I query in Athena?

However, Athena is able to query a variety of file formats, including, but not limited to CSV, Parquet, JSON, etc. In this pos t, we’ll see how we can setup a table in Athena using a sample data set stored in S3 as a.csv file. But for this, we first need that sample CSV file. You can download it here.

How to create a table from Amazon Athena S3 bucket data?

As you can see from the screenshot, you have multiple options to create a table. For this post, we’ll stick with the basics and select the “Create table from S3 bucket data” option.So, now that you have the file in S3, open up Amazon Athena. You’ll get an option to create a table on the Athena home page.


Video Answer


1 Answers

To reproduce your situation, I did the following:

  • Created a text file using your sample data (gps.txt)
  • Uploaded it to an Amazon S3 bucket in its own folder (with no other files in that folder)
  • Created a table in Amazon Athena
    • Specified the location as the folder name (s3://my-bucket/gps/)
    • Specified 7 columns (since there are 7 string values in your sample file)

However, since the data has commas within each pair of numbers, I changed the SerDe to OpenCSVSerDe for Processing CSV - Amazon Athena:

CREATE EXTERNAL TABLE IF NOT EXISTS default.gps (
  `c1` string,
  `c2` string,
  `c3` string,
  `c4` string,
  `c5` string,
  `c6` string,
  `c7` string 
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES ("separatorChar" = ",", "escapeChar" = "\\") 

LOCATION 's3://my-bucket/gps/'
TBLPROPERTIES ('has_encrypted_data'='false');

I was then able to successfully query the table. A sample column value is: 41.9100687,-87.8805614

like image 128
John Rotenstein Avatar answered Oct 10 '22 15:10

John Rotenstein