Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to load data in Hive partitioned table

Tags:

hadoop

hive

I have created a table in Hive with the following query:

create table if not exists employee(CASE_NUMBER String,
                                         CASE_STATUS String,
                                         CASE_RECEIVED_DATE DATE,
                                         DECISION_DATE  DATE,
                                         EMPLOYER_NAME STRING,
                                         PREVAILING_WAGE_PER_YEAR BIGINT,
                                         PAID_WAGE_PER_YEAR BIGINT,
                                         order_n int) partitioned by (JOB_TITLE_SUBGROUP STRING) row format delimited fields terminated by ',';
                                         

I tried loading data into the create table using below query:

LOAD DATA INPATH '/salary_data.csv' overwrite into table employee  partition (JOB_TITLE_SUBGROUP);

For the partitioned table, I have even set following configuration :

set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;

But I am getting below error while executing the load query:

Your query has the following error(s):

Error while compiling statement: FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Invalid partition key & values; keys [job_title_subgroup, ], values [])

Please help.

like image 993
Mohit Sudhera Avatar asked Aug 07 '16 16:08

Mohit Sudhera


1 Answers

If you want to load data into a Hive partition, you have to provide the value of the partition itself in the LOAD DATA query. So in this case, your query would be something like this.

LOAD DATA INPATH '/salary_data.csv' overwrite into table employee partition (JOB_TITLE_SUBGROUP="Value");

Where "Value" is the name of the partition in which you are loading your data. The reason is because Hive will use "Value" to create the directory in which your .csv is going to be stored, which will be something like this: .../employee/JOB_TITLE_SUBGROUP=Value. I hope this helps.

Check the documentation for details on the LOAD DATA syntax.

EDITED

Since the table has dynamic partition, one solution would be loading the .csv into an external table (e.g. employee_external) and then execute an INSERT command like this:

INSERT OVERWRITE INTO TABLE employee PARTITION(JOB_TITLE_SUBGROUP)
SELECT CASE_NUMBER, CASE_STATUS, (...), JOB_TITLE_SUBGROUP
FROM employee_external
like image 118
Jaime Caffarel Avatar answered Sep 23 '22 22:09

Jaime Caffarel