Hive - external (dynamically) partitioned table

Tags:

hiveql

I'm having a table in MySQL viz. nas_comps.

select comp_code, count(leg_id) from nas_comps_01012011_31012011 n group by comp_code;
comp_code     count(leg_id)
'J'           20640
'Y'           39680

First, I imported data onto HDFSHadoop version 1.0.2) using Sqoop :

sqoop import --connect jdbc:mysql://172.25.37.135/pros_olap2 \
--username hadoopranch \
--password hadoopranch \
--query "select * from nas_comps where dep_date between '2011-01-01' and '2011-01-10' AND \$CONDITIONS" \
-m 1 \
--target-dir /pros/olap2/dataimports/nas_comps

Then, I created an external, partitioned Hive table :

/*shows the partitions on 'describe' but not 'show partitions'*/
create external table  nas_comps(DS_NAME string,DEP_DATE string,
                                 CRR_CODE string,FLIGHT_NO string,ORGN string,
                                 DSTN string,PHYSICAL_CAP int,ADJUSTED_CAP int,
                                 CLOSED_CAP int)
PARTITIONED BY (LEG_ID int, month INT, COMP_CODE string)
location '/pros/olap2/dataimports/nas_comps'

The partition columns are shown when described :

hive> describe extended nas_comps;
OK
ds_name string
dep_date        string
crr_code        string
flight_no       string
orgn    string
dstn    string
physical_cap    int
adjusted_cap    int
closed_cap      int
leg_id  int
month   int
comp_code       string

Detailed Table Information      Table(tableName:nas_comps, dbName:pros_olap2_optim, 
owner:hadoopranch, createTime:1374849456, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:ds_name, type:string, comment:null), 
FieldSchema(name:dep_date, type:string, comment:null), FieldSchema(name:crr_code, 
type:string, comment:null), FieldSchema(name:flight_no, type:string, comment:null), 
FieldSchema(name:orgn, type:string, comment:null), FieldSchema(name:dstn, type:string, 
comment:null), FieldSchema(name:physical_cap, type:int, comment:null), 
FieldSchema(name:adjusted_cap, type:int, comment:null), FieldSchema(name:closed_cap, 
type:int, comment:null), FieldSchema(name:leg_id, type:int, comment:null), 
FieldSchema(name:month, type:int, comment:null), FieldSchema(name:comp_code, type:string, 
comment:null)], location:hdfs://172.25.37.21:54300/pros/olap2/dataimports/nas_comps, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, 
numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:
{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:
[FieldSchema(name:leg_id, type:int, comment:null), FieldSchema(name:month, type:int,
comment:null), FieldSchema(name:comp_code, type:string, comment:null)], 
parameters:{EXTERNAL=TRUE, transient_lastDdlTime=1374849456}, viewOriginalText:null, 
viewExpandedText:null, tableType:EXTERNAL_TABLE)

But I'm not sure if the partitions are created because :

hive> show partitions nas_comps;
OK
Time taken: 0.599 seconds


select count(1) from nas_comps;

returns 0 records

How do I create an external Hive table with dynamic partitions?

708

asked Jul 26 '13 09:07

2 Answers

Hive will not create the partitions for you this way.
Just create a table partitioned by the desired partition key, then execute insert overwrite table from the external table to the new partitioned table (setting hive.exec.dynamic.partition=true and hive.exec.dynamic.partition.mode=nonstrict).

If you must keep the table partitioned externally you have to manually create the directories (1 directory per partition the name should be PARTION_KEY=VALUE) then use the MSCK REPAIR TABLE table_name; command

157

answered Oct 10 '22 00:10

dimamah

Dynamic Partition

Partition gets added dynamically during inserting the record into hive table.

Support only with insert statement.
Not supported with load data statement.
Need to enable dynamic partition settings before inserting data into hive table. hive.exec.dynamic.partition.mode=nonstrict default value is strict hive.exec.dynamic.partition=true default value is false.

Dynamic Partition Query

SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;
INSERT INTO table_name PARTITION (loaded_date)
select * from table_name1 where loaded_date = 20151217

Here loaded_date = 20151217 is partition and its value.

Limitations:

Dynamic partition will work with above statement only.
It will create the partition dynamically as per the data which it select from the loaded_date column from the table_name1;

If your condition doesn't match with above criteria, then:

First create a partitioned table then do like this:

ALTER TABLE table_name ADD PARTITION (DS_NAME='partname1',DATE='partname2');

or Please use this Link for Dynamic Partition creation.

answered Oct 09 '22 23:10

Sandeep Singh

Related questions
                            
                                GUI for using Hadoop [closed]
                            
                                Invalid table alias or column reference b
                            
                                Automated Testing in Apache Hive
                            
                                spark on yarn, Container exited with a non-zero exit code 143
                            
                                Does the column order matter in INSERT OVERWRITE statement in Hive?
                            
                                How to specify column list in hive insert into query
                            
                                How to use sqoop to export the default hive delimited output?
                            
                                Wrong result for count(*) in hive table
                            
                                How can I find last modified timestamp for a table in Hive?
                            
                                Hive describe partitions to show partition url
                            
                                Create hive external table from partitioned parquet files in Azure HDInsights
                            
                                exporting Hive table to csv in hdfs
                            
                                Read ORC files directly from Spark shell
                            
                                Hadoop on cassandra database
                            
                                ClassNotFoundException: org.apache.hive.jdbc.HiveDriver
                            
                                Hive - Varchar vs String , Is there any advantage if the storage format is Parquet file format
                            
                                array cast Java 8 vs Java 9
                            
                                Find last day of a month in Hive
                            
                                output/echo a meesage in hql/ hive query language
                            
                                Hive doesn't read partitioned parquet files generated by Spark

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Hive - external (dynamically) partitioned table

Tags:

hive

hiveql

Kaliyug Antagonist

People also ask

2 Answers

dimamah

Sandeep Singh

Recent Activity

Donate For Us