Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

creating partition in external table in hive

I have successfully created and added Dynamic partitions in an Internal table in hive. i.e. by using following steps:

1-created a source table

2-loaded data from local into source table

3- created another table with partitions - partition_table

4- inserted the data to this table from source table resulting in creation of all the partitions dynamically

My question is, how to perform this in external table? I read so many articles on this, but i am confused , that do I have to specify path to the already existing partitions for creating partitions for external table??

example: Step 1:

create external table1 ( name string, age int, height int)
location 'path/to/dataFile/in/HDFS';

Step 2:

alter table table1 add partition(age) 
location 'path/to/already/existing/partition'

I am not sure how to proceed with partitioning in external tables. Can somebody please help by giving step by step description of the same?.

Thanks in advance!

like image 667
Anoop Mamgain Avatar asked Sep 15 '15 07:09

Anoop Mamgain


People also ask

Can we partition external table in Hive?

we can't perform alter on the Dynamic partition. You can perform dynamic partition on hive external table and managed table. If you want to use the Dynamic partition in the hive then the mode is in non-strict mode.

Can we create partition on external table?

Partitioning external tables works in the same way as in managed tables. Except this in the external table, when you delete a partition, the data file doesn't get deleted.

What is default Hive partition in external table?

The HIVE_DEFAULT_PARTITION in hive is represented by a NULL value of the partitioned column. That means, if we have a NULL value for a partition column and loading this record to a partitioned table, then hive_default_partition will get create for that record.


1 Answers

Yes, you have to tell Hive explicitly what is your partition field.

Consider you have a following HDFS directory on which you want to create a external table.

/path/to/dataFile/

Let's say this directory already have data stored(partitioned) department wise as follows:

/path/to/dataFile/dept1
/path/to/dataFile/dept2
/path/to/dataFile/dept3

Each of these directories have bunch of files where each file contains actual comma separated data for fields say name,age,height.

e.g.
    /path/to/dataFile/dept1/file1.txt
    /path/to/dataFile/dept1/file2.txt

Now let's create external table on this:

Step 1. Create external table:

CREATE EXTERNAL TABLE testdb.table1(name string, age int, height int)
PARTITIONED BY (dept string)
ROW FORMAT DELIMITED
STORED AS TEXTFILE
LOCATION '/path/to/dataFile/';

Step 2. Add partitions:

ALTER TABLE testdb.table1 ADD PARTITION (dept='dept1') LOCATION '/path/to/dataFile/dept1';
ALTER TABLE testdb.table1 ADD PARTITION (dept='dept2') LOCATION '/path/to/dataFile/dept2';
ALTER TABLE testdb.table1 ADD PARTITION (dept='dept3') LOCATION '/path/to/dataFile/dept3';

Done, run select query once to verify if data loaded successfully.

like image 150
Sachin Gaikwad Avatar answered Oct 11 '22 13:10

Sachin Gaikwad