Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

setting compression on hive table

I have a hive table based on avro schema. The table was created with the following query

CREATE EXTERNAL TABLE datatbl PARTITIONED BY (date String, int time) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'avro.schema.url'='path to schema file on HDFS') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '<path on hdfs>'

So far we have been inserting data into the table by setting the following properties

hive> set hive.exec.compress.output=true; hive> set avro.output.codec=snappy;

However, if someone forgets to set the above two properties the compression is not achieved. I was wondering if there is a way to enforce compression on table itself so that even if the above two properties are not set the data is always compressed?

like image 325
Vikas Saxena Avatar asked Oct 19 '22 05:10

Vikas Saxena


1 Answers

Yes, you can set the properties in the table. Try the following:

 CREATE EXTERNAL TABLE datatbl PARTITIONED BY (date String, int time)  
 ROW FORMAT SERDE   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'  
 WITH SERDEPROPERTIES (   'avro.schema.url'='path to schema file on
 HDFS')   STORED as INPUTFORMAT  
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  
 OUTPUTFORMAT  
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION
 '<path on hdfs>'
 TBLPROPERTIES ( "orc.compress"="SNAPPY" );
like image 150
dbustosp Avatar answered Oct 22 '22 23:10

dbustosp