Specify compression type with Athena

Tags:

amazon-athena

I have S3 data which is has GZIP compression. I'm trying to create a table in Athena using this file, and my CREATE TABLE statement succeeds - but when I query the table all rows are empty.

create external table mydatabase.table1 (
   date date,
   week_begin_date date,
   week_end_date date,
   value float
)
row format delimited fields terminated by ','
stored as inputformat 'org.apache.hadoop.mapred.TextInputFormat'     
outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location 's3://my-bucket/some/path/'

How can I insist that Athena read my files as GZIP?

618

asked Feb 15 '18 22:02

1 Answers

While Athena supports TBLPROPERTIES metadata (we can set properties within a CREATE TABLE, ALTER TABLE to set these properties, and SHOW TBLPROPERTIES to display properties of any table), it does not respect the TBLPROPERTIES ('compressionType'='gzip') option.

There's no apparent way to force compression / decompression algorithm. Athena attempts to identify compression based on file extension. A GZIP file with a .gz suffix will be readable; a GZIP file without that suffix will not.

Similarly, an uncompressed file with a .gz suffix will fail. The reported error is

HIVE_CURSOR_ERROR: incorrect header check

Some investigation revealed the following:

The only known way to have Athena recognize a file as a GZIP is to name it with a .gz suffix.
Other similar suffixes that do not work include .gzip, .zip, [^.]gz
GZIP and uncompressed files can live happily side by side in an Athena table or partition - the compression detection is done at the file level, not at the table level.

187

answered Sep 30 '22 12:09

Kirk Broadhurst

Related questions
                            
                                Structure Difference between partitioning and bucketing in hive
                            
                                How to get the value of the location for a Hive table using a Spark object?
                            
                                Partition Hive table by existing field?
                            
                                Use more than one collect_list in one query in Spark SQL
                            
                                Apache hive MSCK REPAIR TABLE new partition not added
                            
                                How to save Spark RDD to local filesystem
                            
                                Will Spark SQL completely replace Apache Impala or Apache Hive?
                            
                                CASE statements in Hive
                            
                                How to convert a Date String from UTC to Specific TimeZone in HIVE?
                            
                                Hive 1.1.0 Alter table partition type from int to string
                            
                                Cannot connect to hive using beeline, user root cannot impersonate anonymous
                            
                                message:Hive Schema version 1.2.0 does not match metastore's schema version 2.1.0 Metastore is not upgraded or corrupt
                            
                                Sqoop Hive table import, Table dataType doesn't match with database
                            
                                how to add columns to existing hive partitioned table?
                            
                                Hive command to execute NOT IN clause
                            
                                Hive Union Group By Error
                            
                                hive-site.xml path in hive0.13.1
                            
                                Removing DUPLICATE rows in hive based on columns
                            
                                HIVE - INSERT OVERWRITE using WITH CLAUSE
                            
                                How to convert a date format YYYY-MM-DD into integer YYYYMMDD in Presto/Hive?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Specify compression type with Athena

Tags:

hive

amazon-athena

Kirk Broadhurst

People also ask

1 Answers

Kirk Broadhurst

Recent Activity

Donate For Us