I have a table test_details
with some 4 million records. Using the data in this table, I have to create a new partitioned table test_details_par
with records partitioned on visit_date
. Creating the table is not a challenge, but when I come to the part where I have to INSERT the data using Dynamic Partitions, Hive gives up when I try to insert data for more number of days. If I do it for 2 or 3 days the Map Reduce jobs runs successfully but for more days it fails giving a JAVA Heap Space Error
or GC Error
.
A Simplified Snapshot of my DDLs is as follows:
CREATE TABLE test_details_par( visit_id INT, visit_date DATE, store_id SMALLINT);
INSERT INTO TABLE test_details_par PARTITION(visit_date) SELECT visit_id, store_id, visit_date FROM test_details DISTRIBUTE BY visit_date;
I have tried setting these parameters, so that Hive executes my job in a better way:
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions.pernode = 10000;
Is there anything that I am missing to run the INSERT for a complete batch without specifying the dates specifically?
use StudentData; Step2: Enable the dynamic partition by using the following commands: - set hive. exec. dynamic. partition=true; set hive.
A single query cannot commit more than 500,000 (500K) partitions. To avoid this, define the time range of the query to remain within this limitation.
Dynamic Partitioning in Hive Partitions are automatically created based on the value of the last column.
Dynamic Partition takes more time in loading data compared to static partition. When you have large data stored in a table then the Dynamic partition is suitable. If you want to partition a number of columns but you don't know how many columns then also dynamic partition is suitable.
Neels,
Hive 12 and below have well-known scalability issues with dynamic partitioning that will be addressed with Hive 13. The problem is that Hive attempts to hold a file handle open for each and every partition it writes out, which causes out of memory and crashes. Hive 13 will sort by partition key so that it only needs to hold one file open at a time.
You have 3 options as I see
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With