I'm having a table in MySQL viz. nas_comps.
select comp_code, count(leg_id) from nas_comps_01012011_31012011 n group by comp_code;
comp_code count(leg_id)
'J' 20640
'Y' 39680
First, I imported data onto HDFSHadoop version 1.0.2) using Sqoop :
sqoop import --connect jdbc:mysql://172.25.37.135/pros_olap2 \
--username hadoopranch \
--password hadoopranch \
--query "select * from nas_comps where dep_date between '2011-01-01' and '2011-01-10' AND \$CONDITIONS" \
-m 1 \
--target-dir /pros/olap2/dataimports/nas_comps
Then, I created an external, partitioned Hive table :
/*shows the partitions on 'describe' but not 'show partitions'*/
create external table nas_comps(DS_NAME string,DEP_DATE string,
CRR_CODE string,FLIGHT_NO string,ORGN string,
DSTN string,PHYSICAL_CAP int,ADJUSTED_CAP int,
CLOSED_CAP int)
PARTITIONED BY (LEG_ID int, month INT, COMP_CODE string)
location '/pros/olap2/dataimports/nas_comps'
The partition columns are shown when described :
hive> describe extended nas_comps;
OK
ds_name string
dep_date string
crr_code string
flight_no string
orgn string
dstn string
physical_cap int
adjusted_cap int
closed_cap int
leg_id int
month int
comp_code string
Detailed Table Information Table(tableName:nas_comps, dbName:pros_olap2_optim,
owner:hadoopranch, createTime:1374849456, lastAccessTime:0, retention:0,
sd:StorageDescriptor(cols:[FieldSchema(name:ds_name, type:string, comment:null),
FieldSchema(name:dep_date, type:string, comment:null), FieldSchema(name:crr_code,
type:string, comment:null), FieldSchema(name:flight_no, type:string, comment:null),
FieldSchema(name:orgn, type:string, comment:null), FieldSchema(name:dstn, type:string,
comment:null), FieldSchema(name:physical_cap, type:int, comment:null),
FieldSchema(name:adjusted_cap, type:int, comment:null), FieldSchema(name:closed_cap,
type:int, comment:null), FieldSchema(name:leg_id, type:int, comment:null),
FieldSchema(name:month, type:int, comment:null), FieldSchema(name:comp_code, type:string,
comment:null)], location:hdfs://172.25.37.21:54300/pros/olap2/dataimports/nas_comps,
inputFormat:org.apache.hadoop.mapred.TextInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false,
numBuckets:-1, serdeInfo:SerDeInfo(name:null,
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:
{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:
[FieldSchema(name:leg_id, type:int, comment:null), FieldSchema(name:month, type:int,
comment:null), FieldSchema(name:comp_code, type:string, comment:null)],
parameters:{EXTERNAL=TRUE, transient_lastDdlTime=1374849456}, viewOriginalText:null,
viewExpandedText:null, tableType:EXTERNAL_TABLE)
But I'm not sure if the partitions are created because :
hive> show partitions nas_comps;
OK
Time taken: 0.599 seconds
select count(1) from nas_comps;
returns 0 records
How do I create an external Hive table with dynamic partitions?
use StudentData; Step2: Enable the dynamic partition by using the following commands: - set hive. exec. dynamic. partition=true; set hive.
Dynamic partitioning is the strategic approach to load the data from the non-partitioned table where the single insert to the partition table is called a dynamic partition.
in static partitioning we need to specify the partition column value in each and every LOAD statement. dynamic partition allow us not to specify partition column value each time. the approach we follows is as below: create a non-partitioned table t2 and insert data into it.
Hive will not create the partitions for you this way.
Just create a table partitioned by the desired partition key, then execute insert overwrite table
from the external table to the new partitioned table (setting hive.exec.dynamic.partition=true
and hive.exec.dynamic.partition.mode=nonstrict
).
If you must keep the table partitioned externally you have to manually create the directories (1 directory per partition the name should be PARTION_KEY=VALUE
)
then use the MSCK REPAIR TABLE table_name;
command
Dynamic Partition
Partition gets added dynamically during inserting the record into hive table.
load data
statement.hive.exec.dynamic.partition.mode=nonstrict
default value is strict
hive.exec.dynamic.partition=true
default value is false
.Dynamic Partition Query
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;
INSERT INTO table_name PARTITION (loaded_date)
select * from table_name1 where loaded_date = 20151217
Here loaded_date = 20151217
is partition and its value.
Limitations:
loaded_date
column from the table_name1
;If your condition doesn't match with above criteria, then:
First create a partitioned table then do like this:
ALTER TABLE table_name ADD PARTITION (DS_NAME='partname1',DATE='partname2');
or Please use this Link for Dynamic Partition creation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With