Using tar.gz file as a source for Amazon Athena

Tags:

If I define *.tsv files on Amazon S3 as a source for an Athena table and use OpenCSVSerde or LazySimpleSerDe as a deserializer it works correctly. But if I define *.tar.gz files that include *.tsv files I see several strange rows in a table (e.g. a row that contains tsv file name and several empty rows). What is the right way to use tar.gz files in Athena?

812

asked Sep 20 '17 12:09

Alexander Ershov

1 Answers

The problem is tar, it adds additional rows. Athena can open only *.gz files, but not tar. So in this case I have to use *.gz instead of *.tar.gz.

121

answered Sep 18 '22 00:09

Alexander Ershov

Related questions
                            
                                SSL on Elastic Beanstalk
                            
                                How "Real-Time" DynamoDB stream is?
                            
                                Upload jpg to S3: "The request body terminated unexpectedly"
                            
                                How do I get the Hosted Zone for a domain using Boto 3?
                            
                                Swift 3: How to set multiple cookies for JWPlayer for HLS Streaming
                            
                                How to upload documents to AWS CloudSearch with Boto3
                            
                                Script or api to provide the ami-id of the latest amazon-ecs-optimized image
                            
                                DynamoDB TTL: when are items removed
                            
                                Can't register EC2 instance in ELB
                            
                                Cognito User Pool Groups not working with different roles
                            
                                AWS javascript SDK SES SendMail Illegal Address
                            
                                Understanding AWS route-tables - cannot create a more specific route
                            
                                DynamoDB Mapper mapping Collection Datatypes
                            
                                AWS Lambda rename the function
                            
                                S3A: fails while S3: works in Spark EMR
                            
                                AWS EC2, pm2 : Cannot see pm2 running list
                            
                                Syntax for filters for aws rds describe-db-instances
                            
                                Rate limit AWS API gateway endpoint
                            
                                On AWS S3, can I exclude a file from lifecycle rule
                            
                                Changing the auto-generated kops kubernetes admin password

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using tar.gz file as a source for Amazon Athena

Tags:

amazon-web-services

amazon-s3

amazon-athena

Alexander Ershov

People also ask

1 Answers

Alexander Ershov

Recent Activity

Donate For Us