Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive - external (dynamically) partitioned table

Tags:

hive

hiveql

I'm having a table in MySQL viz. nas_comps.

select comp_code, count(leg_id) from nas_comps_01012011_31012011 n group by comp_code;
comp_code     count(leg_id)
'J'           20640
'Y'           39680

First, I imported data onto HDFSHadoop version 1.0.2) using Sqoop :

sqoop import --connect jdbc:mysql://172.25.37.135/pros_olap2 \
--username hadoopranch \
--password hadoopranch \
--query "select * from nas_comps where dep_date between '2011-01-01' and '2011-01-10' AND \$CONDITIONS" \
-m 1 \
--target-dir /pros/olap2/dataimports/nas_comps

Then, I created an external, partitioned Hive table :

/*shows the partitions on 'describe' but not 'show partitions'*/
create external table  nas_comps(DS_NAME string,DEP_DATE string,
                                 CRR_CODE string,FLIGHT_NO string,ORGN string,
                                 DSTN string,PHYSICAL_CAP int,ADJUSTED_CAP int,
                                 CLOSED_CAP int)
PARTITIONED BY (LEG_ID int, month INT, COMP_CODE string)
location '/pros/olap2/dataimports/nas_comps'

The partition columns are shown when described :

hive> describe extended nas_comps;
OK
ds_name string
dep_date        string
crr_code        string
flight_no       string
orgn    string
dstn    string
physical_cap    int
adjusted_cap    int
closed_cap      int
leg_id  int
month   int
comp_code       string

Detailed Table Information      Table(tableName:nas_comps, dbName:pros_olap2_optim, 
owner:hadoopranch, createTime:1374849456, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:ds_name, type:string, comment:null), 
FieldSchema(name:dep_date, type:string, comment:null), FieldSchema(name:crr_code, 
type:string, comment:null), FieldSchema(name:flight_no, type:string, comment:null), 
FieldSchema(name:orgn, type:string, comment:null), FieldSchema(name:dstn, type:string, 
comment:null), FieldSchema(name:physical_cap, type:int, comment:null), 
FieldSchema(name:adjusted_cap, type:int, comment:null), FieldSchema(name:closed_cap, 
type:int, comment:null), FieldSchema(name:leg_id, type:int, comment:null), 
FieldSchema(name:month, type:int, comment:null), FieldSchema(name:comp_code, type:string, 
comment:null)], location:hdfs://172.25.37.21:54300/pros/olap2/dataimports/nas_comps, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, 
numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:
{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:
[FieldSchema(name:leg_id, type:int, comment:null), FieldSchema(name:month, type:int,
comment:null), FieldSchema(name:comp_code, type:string, comment:null)], 
parameters:{EXTERNAL=TRUE, transient_lastDdlTime=1374849456}, viewOriginalText:null, 
viewExpandedText:null, tableType:EXTERNAL_TABLE)

But I'm not sure if the partitions are created because :

hive> show partitions nas_comps;
OK
Time taken: 0.599 seconds


select count(1) from nas_comps;

returns 0 records

How do I create an external Hive table with dynamic partitions?

like image 708
Kaliyug Antagonist Avatar asked Jul 26 '13 09:07

Kaliyug Antagonist


People also ask

How do I enable dynamic partitioning in Hive?

use StudentData; Step2: Enable the dynamic partition by using the following commands: - set hive. exec. dynamic. partition=true; set hive.

What is dynamic partitioning in Hive?

Dynamic partitioning is the strategic approach to load the data from the non-partitioned table where the single insert to the partition table is called a dynamic partition.

What is difference between static and dynamic partitioning?

in static partitioning we need to specify the partition column value in each and every LOAD statement. dynamic partition allow us not to specify partition column value each time. the approach we follows is as below: create a non-partitioned table t2 and insert data into it.


2 Answers

Hive will not create the partitions for you this way.
Just create a table partitioned by the desired partition key, then execute insert overwrite table from the external table to the new partitioned table (setting hive.exec.dynamic.partition=true and hive.exec.dynamic.partition.mode=nonstrict).

If you must keep the table partitioned externally you have to manually create the directories (1 directory per partition the name should be PARTION_KEY=VALUE) then use the MSCK REPAIR TABLE table_name; command

like image 157
dimamah Avatar answered Oct 10 '22 00:10

dimamah


Dynamic Partition

Partition gets added dynamically during inserting the record into hive table.

  1. Support only with insert statement.
  2. Not supported with load data statement.
  3. Need to enable dynamic partition settings before inserting data into hive table. hive.exec.dynamic.partition.mode=nonstrict default value is strict  hive.exec.dynamic.partition=true default value is false.

Dynamic Partition Query

SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;
INSERT INTO table_name PARTITION (loaded_date)
select * from table_name1 where loaded_date = 20151217

Here loaded_date = 20151217 is partition and its value.

Limitations:

  1. Dynamic partition will work with above statement only.
  2. It will create the partition dynamically as per the data which it select from the loaded_date column from the table_name1;

If your condition doesn't match with above criteria, then:

First create a partitioned table then do like this:

ALTER TABLE table_name ADD PARTITION (DS_NAME='partname1',DATE='partname2'); 

or Please use this Link for Dynamic Partition creation.

like image 43
Sandeep Singh Avatar answered Oct 09 '22 23:10

Sandeep Singh