Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon Athena and compressed S3 files

I have an S3 bucket with several zipped CSV files (utilization logs.) I'd like to query this data with Athena, but the output is completely garbled.

It appears Athena is trying to parse the zip files without decompressing them first. Is it possible to force Hive to recognize my files as compressed data?

like image 676
MattY Avatar asked Dec 19 '16 20:12

MattY


People also ask

Can Athena read compressed files?

Athena supports a variety of compression formats for reading and writing data, including reading from a table that uses multiple compression formats.

What is the difference between S3 and Athena?

So Whats the Difference Between S3 Select and Athena? S3 Select is a lightweight solution designed to let you use SQL to perform simple SELECT clauses on a maximum of one file. Amazon Athena is an analytics workhorse that allows you to perform SQL on extremely large datasets spanning many files with great performance.

Does AWS S3 compress data?

S3 does not support stream compression nor is it possible to compress the uploaded file remotely.

Can Athena query encrypted data in S3?

The new Athena encrypted data feature also supports encrypting query results and storing these results in Amazon S3.


1 Answers

For Athena compression is supported, but the supported formats are

  • Snappy (.snappy)
  • Zlib (.bz2)
  • GZIP (.gz)

Those formats are detected by their filename suffix. If the suffix doesn't match, the reader does not decode the content. I tested it with a test.csv.gz file and it worked right away. So try changing the compression from zip to gzip and it should work.

like image 144
jens walter Avatar answered Sep 24 '22 19:09

jens walter