How can I partition a table with HIVE?

Tags:

I've been playing with Hive for few days now but I still have a hard time with partition.

I've been recording Apache logs (Combine format) in Hadoop for few months. They are stored in row text format, partitioned by date (via flume): /logs/yyyy/mm/dd/hh/*

Example:

/logs/2012/02/10/00/Part01xx (02/10/2012 12:00 am)
/logs/2012/02/10/00/Part02xx
/logs/2012/02/10/13/Part0xxx (02/10/2012 01:00 pm)

The date in the combined log file is following this format [10/Feb/2012:00:00:00 -0800]

How can I create a external table with partition in Hive that use my physical partition. I can't find any good documentation on Hive partition. I found related Question such as:

Importing data from HDFS to Hive table
partition column in hive

If I load my logs in an external table with Hive, I cannot partition with the time, since it's not the good format (Feb <=> 02). Even if if it was in a good format how do i transform a string "10/02/2012:00:00:00 -0800" into multiple directory "/2012/02/10/00"?

I could eventually use pig script to convert my raw logs into hive tables but at this point I should just be using pig instead of hive to do my reporting.

293

asked Mar 08 '12 23:03

zzarbi

1 Answers

If I understand correctly, you have files in the folders 4 level deep from the directory logs. In that case, you define your table as external with path 'logs' and partitioned by 4 virtual fields: year, month, day_of_month, hour_of_day.

The partitioning is essentially done for you by Flume.

EDIT 3/9: A lot of details depends on how exactly Flume writes files. But in general terms, your DDL should look something like this:

CREATE TABLE table_name(fields...)
PARTITIONED BY(log_year STRING, log_month STRING, 
    log_day_of_month STRING, log_hour_of_day STRING)
format description
STORED AS TEXTFILE
LOCATION '/your user path/logs';

EDIT 3/15: Per zzarbi request, I'm adding a note that after the table is created, the Hive needs to be informed about partitions created. This needs to be done repeatedly as long as Flume or other process creates new partitions. See my answer to Create external with Partition question.

124

answered Oct 01 '22 14:10

Olaf

Related questions
                            
                                How do I specify multiple libpath in oozie job?
                            
                                How can I Read and Transfer chunks of file with Hadoop WebHDFS?
                            
                                Spark/Hadoop - Not able to save to s3 with server side encryption
                            
                                dep interpreter not found
                            
                                How to setup Apache Spark to use local hard disk when data does not fit in RAM in local mode?
                            
                                How to count number of files under specific directory in hadoop?
                            
                                How to decrease heartbeat time of slave nodes in Hadoop
                            
                                Running from a local IDE against a remote Spark cluster
                            
                                error: not found: value assemblyJarName in assembly
                            
                                How do I restart hadoop services on dataproc cluster
                            
                                Why is Apache Orc RecordReader.searchArgument() not filtering correctly?
                            
                                How to run hive script from hive cli
                            
                                How to use new Hadoop parquet magic commiter to custom S3 server with Spark
                            
                                How to read Parquet file from S3 without spark? Java
                            
                                Need help implementing this algorithm with map Hadoop MapReduce
                            
                                How to transfer mysql table to hive?
                            
                                Running Pig query over data stored in Hive
                            
                                Accessing HBase running in VM with a client on host system
                            
                                How to make hive run mapreduce jobs concurrently?
                            
                                Unusual Hadoop error - tasks get killed on their own

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I partition a table with HIVE?

Tags:

hadoop

hive

apache-pig

mapreduce

zzarbi

People also ask

1 Answers

Olaf

Recent Activity

Donate For Us