Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a partitioned hive table from a non partitioned table

Tags:

hive

partition

I have a Hive table which was created by joining data from multiple tables. The data for this resides in a folder which has multiple files ("0001_1" , "0001_2", ... and so on). I need to create a partitioned table based on a date field in this table called pt_dt (either by altering this table or creating a new one). Is there a way to do this?

I've tried creating a new table and inserting into it (below) which did not work

create external table table2 (acct_id bigint, eval_dt string)
partitioned by (pt_dt string);
insert into table2
partition (pt_dt) 
select acct_id, eval_dt, pt_dt
from jmx948_variable_summary;

This throws the error

"FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 189 Cumulative CPU: 401.68 sec HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 6 minutes 41 seconds 680 msec"

like image 818
veemo Avatar asked Aug 13 '15 20:08

veemo


People also ask

Can we add partitions in existing non partitioned table in Hive?

Yep, create a new one defined the way you want the partitions to be and then insert into that new one using dynamic partitioning and you'll be good to go.


1 Answers

Was able to figure it out after some trial & error.

Enable dynamic partitioning in Hive:

SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;

Create schema for partitioned table:

CREATE TABLE table1 (id STRING, info STRING)
PARTITIONED BY ( tdate STRING);

Insert into partitioned table :

FROM table2 t2
INSERT OVERWRITE TABLE table1 PARTITION(tdate)
SELECT t2.id, t2.info, t2.tdate
DISTRIBUTE BY tdate;
like image 159
veemo Avatar answered Oct 14 '22 08:10

veemo