I have this data stored in S3 as .csv (but it can be any other file format which is the best suitable for my requirement): <pre class="prettyprint"><code>"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417", "41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417", "41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417", "41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417", </code></pre> and I would like to have one coordinate per column: Like this: Coordinates: <pre class="prettyprint"><code>1. 41.9100687,-87.8805614 2. 41.9802511,-87.8803253 3. 41.9806802,-87.8792417 </code></pre> After importing S3 I choose CSV as data type... And then I add string column. <img src="https://i.stack.imgur.com/TlfXT.png" alt="enter image description here"> But instead I get some weird table output. Beside this I tried to import this as plain txt file with comma delimiter.. I get same weird output. <img src="https://i.stack.imgur.com/BCLdM.png" alt="enter image description here"> What am I doing wrong here? EDIT This <code>test</code> column screenshot is query from another but identical example. There should be <code>gps_coordinates</code>

To reproduce your situation, I did the following: <ul> <li>Created a text file using your sample data (<code>gps.txt</code>)</li> <li>Uploaded it to an Amazon S3 bucket in its own folder (with no other files in that folder)</li> <li>Created a table in Amazon Athena <ul> <li>Specified the location as the folder name (<code>s3://my-bucket/gps/</code>)</li> <li>Specified 7 columns (since there are 7 string values in your sample file)</li> </ul> </li> </ul> However, since the data has commas within each pair of numbers, I changed the SerDe to OpenCSVSerDe for Processing CSV - Amazon Athena: <pre class="prettyprint lang-sql prettyprint-override"><code>CREATE EXTERNAL TABLE IF NOT EXISTS default.gps ( `c1` string, `c2` string, `c3` string, `c4` string, `c5` string, `c6` string, `c7` string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ("separatorChar" = ",", "escapeChar" = "\\") LOCATION 's3://my-bucket/gps/' TBLPROPERTIES ('has_encrypted_data'='false'); </code></pre> I was then able to successfully query the table. A sample column value is: <code>41.9100687,-87.8805614</code>

AWS Athena Import CSV file

Tags:

amazon-web-services

amazon-athena

I have this data stored in S3 as .csv (but it can be any other file format which is the best suitable for my requirement):

"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",

and I would like to have one coordinate per column:

Like this:

Coordinates:

1.  41.9100687,-87.8805614

2.  41.9802511,-87.8803253

3.  41.9806802,-87.8792417

After importing S3 I choose CSV as data type... And then I add string column.

enter image description here

But instead I get some weird table output. Beside this I tried to import this as plain txt file with comma delimiter.. I get same weird output.

enter image description here

What am I doing wrong here?

EDIT

This test column screenshot is query from another but identical example. There should be gps_coordinates

447

asked Oct 20 '19 11:10

harunB10

Video Answer

1 Answers

To reproduce your situation, I did the following:

Created a text file using your sample data (gps.txt)
Uploaded it to an Amazon S3 bucket in its own folder (with no other files in that folder)
Created a table in Amazon Athena
- Specified the location as the folder name (s3://my-bucket/gps/)
- Specified 7 columns (since there are 7 string values in your sample file)

However, since the data has commas within each pair of numbers, I changed the SerDe to OpenCSVSerDe for Processing CSV - Amazon Athena:

CREATE EXTERNAL TABLE IF NOT EXISTS default.gps (
  `c1` string,
  `c2` string,
  `c3` string,
  `c4` string,
  `c5` string,
  `c6` string,
  `c7` string 
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES ("separatorChar" = ",", "escapeChar" = "\\") 

LOCATION 's3://my-bucket/gps/'
TBLPROPERTIES ('has_encrypted_data'='false');

I was then able to successfully query the table. A sample column value is: 41.9100687,-87.8805614

128

answered Oct 10 '22 15:10

John Rotenstein

Related questions
                            
                                Cant Modify or Resize Amazon EBS Volume
                            
                                How do I write the policy statement of an encrypted SQS for S3 events?
                            
                                Missing required client configuration options: region
                            
                                Issues Creating a Glue Connection to an MS SQL Server RDS
                            
                                Changing ACLs of objects in an S3 bucket using Boto3
                            
                                Is there a way to set a walltime on AWS Batch jobs?
                            
                                AWS gives us Amazon MQ but how can I trigger a Lambda?
                            
                                How to allow only email as username alias with CloudFormation?
                            
                                I want to know the sample bucket name in boto3
                            
                                Terraform: Creating and validating multiple ACM certificates
                            
                                create a read-only IAM user in AWS
                            
                                aws: boto3 get all instances of a load balancers
                            
                                How do I run my CDK app?
                            
                                Clearing out tmp folder from AWS Lambda
                            
                                Terraform init fails for remote backend S3 when creating the state bucket
                            
                                Value of property SecurityGroupIds must be of type List of String error while updating stack
                            
                                When Jenkins Building a maven project gave Error: Could not find or load main class org.apache.maven.surefire.booter.ForkedBooter Jenkins [duplicate]
                            
                                DynamoDB: When does 1MB limit for queries apply
                            
                                S3 notification creates multiple events
                            
                                How to get Latest Published version of a Lambda Function in AWS CLI

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With