Specifying compression codec for a INSERT OVERWRITE SELECT in Hive

Tags:

I have a hive table like

  CREATE TABLE beacons
 (
     foo string,
     bar string,
     foonotbar string
 )
 COMMENT "Digest of daily beacons, by day"
 PARTITIONED BY ( day string COMMENt "In YYYY-MM-DD format" );

To populate, I am doing something like:

 SET hive.exec.compress.output=True;
 SET io.seqfile.compression.type=BLOCK;

 INSERT OVERWRITE TABLE beacons PARTITION ( day = "2011-01-26" ) SELECT
   someFunc(query, "foo") as foo,
   someFunc(query, "bar") as bar,
   otherFunc(query, "foo||bar") as foonotbar
   )
  FROM raw_logs
WHERE day = "2011-01-26";

This builds a new partition with the individual products compressed through deflate, but the ideal here would be to go through the LZO compression codec instead.

Unfortunately I am not exactly sure how to accomplish that, but I assume it's one of the many runtime settings or perhaps just an additional line in the CREATE TABLE DDL.

864

asked Jan 28 '11 17:01

David

1 Answers

Before the INSERT OVERWRITE prepend with the following runtime configuration values:

SET hive.exec.compress.output=true; 
SET io.seqfile.compression.type=BLOCK;
SET mapred.output.compression.codec = com.hadoop.compression.lzo.LzopCodec;

Also make sure you have the desired compression codec by checking:

io.compression.codecs

Further information about io.seqfile.compression.type can be found here http://wiki.apache.org/hadoop/Hive/CompressedStorage

I maybe mistaken, but it seemed like BLOCK type would ensure larger files compressed at a higher ratio vs. a smaller set of lower compressed files.

answered Sep 21 '22 10:09

David

Related questions
                            
                                How to configure WCF service from code when hosted in IIS?
                            
                                exporting layer configurations from geoserver
                            
                                User rights needed for IIS 7.5 application pool user (domain user, not the AppPoolIdentity)
                            
                                New Connection fails during Add New Data Source Dialog
                            
                                iPhone - how to change the build configuration to distribution
                            
                                How do I get the values from a ConfigSection defined as NameValueSectionHandler when using ConfigurationManager.OpenMappedExeConfiguration
                            
                                Why Lua for configuration/plugins? [closed]
                            
                                ConfigurationManager.GetSection(sectionName) returns null while performing unit tests
                            
                                Change default options in pandas
                            
                                Loading static resources with Spring Boot and Thymeleaf
                            
                                How do I allow an MIME extension map in ASP.NET vNext?
                            
                                How can one configure flask to be accessible via public IP interface? [duplicate]
                            
                                Add an implementation, such as Hibernate Validator, to the classpath error using springboot
                            
                                How to read multiple values in C# app.config file?
                            
                                Where to find the DTD of Hibernate?
                            
                                Xamarin/Visual Studio 2015: Custom Proguard configuration
                            
                                How/where I can set eslint config file in spacemacs
                            
                                Visual Studio 2022: Bold font in the entire editor
                            
                                How to programmatically determine the document root in PHP?
                            
                                CherryPy combine file and dictionary based configuration

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Specifying compression codec for a INSERT OVERWRITE SELECT in Hive

Tags:

configuration

compression

hadoop

hive

David

People also ask

1 Answers

David

Recent Activity

Donate For Us