Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to specify input file format when loading data into Hive

Tags:

hadoop

hive

I am trying to load data from Gzip archive into Hive table but my gzip files have extension like for example:

apache_log.gz_localhost

When I specify HDFS directory location where these files are located Hive doesn't recognize GZip compressed files because it is searching for files with .gz extension.

Is it possible to define file type when loading data into Hive? Something like (PSEUDO):

set input.format=gzip;

LOAD DATA INPATH /tmp/logs/ INTO TABLE apache_logs;

Here is my SQL for table creation:

CREATE EXTERNAL TABLE access_logs (
`ip`                STRING,
`time_local`        STRING,
`method`            STRING,
`request_uri`       STRING,
`protocol`          STRING,
`status`            STRING,
`bytes_sent`        STRING,
`referer`           STRING,
`useragent`         STRING,
`bytes_received`    STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'input.regex'='^(\\S+) \\S+ \\S+ \\[([^\\[]+)\\] "(\\w+) (\\S+) (\\S+)" (\\d+) (\\d+|\-) "([^"]+)" "([^"]+)".* (\\d+)'
)
STORED AS TEXTFILE
LOCATION '/tmp/logs/';
like image 327
antunovic Avatar asked Jun 14 '13 10:06

antunovic


1 Answers

Why not change file name to xxx.gz after put in HDFS?

If you really wanna support .gz_localhost, I think you can custom your own GzipCodec to relize it:

  1. Create a your own NewGzipCodec Class which extend GzipCodec:

    public class NewGzipCodec extends org.apache.hadoop.io.compress.GzipCodec { }

  2. override method getDefaultExtension:

    public String getDefaultExtension() { return ".gz_locahost"; }

  3. javac and compress NewGzipCodec.class to NewGzipCodec.jar

  4. upload NewGzipCodec.jar to {$HADOOP_HOME}/lib

  5. set up your core-site.xml

<property>
  <name>io.compression.codecs</name>
  <value>NewGzipCodec, org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
</property>
like image 53
pensz Avatar answered Oct 15 '22 10:10

pensz