I have a Hive table partitioned on date. I want to be able to selectively overwrite the partitions for the last 'n' days (or custom list of partitions). Is there a way to do it without writing "INSERT OVERWRITE DIRECTORY" statement for each partition? Any help is greatly appreciated.

Hive supports dynamic partitioning, so you can build a query where the partition is just one of the source fields. <pre class="prettyprint"><code>INSERT OVERWRITE TABLE dst partition (dt) SELECT col0, col1, ... coln, dt from src where ... </code></pre> The where clause can specify which values of dt you want to overwrite. Just include the partition field (dt in this case) last in the list from the source, you can even do <code>SELECT *, dt</code> if the dt field is already part of the source or even <code>SELECT *,my_udf(dt) as dt</code>, etc By default, Hive wants at least one of the partitions specified to be static, but you can allow it to be nonstrict; so for the above query, you can set the following before the running: <pre class="prettyprint"><code>set hive.exec.dynamic.partition.mode=nonstrict; </code></pre>

Hive : Insert overwrite multiple partitions

1 Answers

Hive supports dynamic partitioning, so you can build a query where the partition is just one of the source fields.

Click to copy

INSERT OVERWRITE TABLE dst partition (dt) 
SELECT col0, col1, ... coln, dt from src where ...

The where clause can specify which values of dt you want to overwrite.

Just include the partition field (dt in this case) last in the list from the source, you can even do SELECT *, dt if the dt field is already part of the source or even SELECT *,my_udf(dt) as dt, etc

By default, Hive wants at least one of the partitions specified to be static, but you can allow it to be nonstrict; so for the above query, you can set the following before the running:

Click to copy

set hive.exec.dynamic.partition.mode=nonstrict;

answered Sep 22 '22 18:09

libjack

Related questions
                            
                                How to run Hbase Java example?
                            
                                HDFS Reduced Replication Factor
                            
                                Which files are ignored as input by mapper?
                            
                                Difference between fs.defaultFS and fs.default.name
                            
                                How to optimize shuffling/sorting phase in a hadoop job
                            
                                Broken Pipe Error causes streaming Elastic MapReduce job on AWS to fail
                            
                                Hadoop streaming - remove trailing tab from reducer output
                            
                                Invalid URI for NameNode address
                            
                                Confusion about distributed cache in Hadoop
                            
                                hdfs Datanode denied communication with namenode because hostname cannot be resolved
                            
                                Oozie Job Error - java.io.IOException: configuration is not specified
                            
                                Get Columns in a specific Column Family for a row HBase
                            
                                Read a text file from HDFS line by line in mapper
                            
                                Connect Hive through Java JDBC
                            
                                Hive table locks
                            
                                Difference between job, application, task, task attempt logs in Hadoop, Oozie
                            
                                Namenode high availability client request
                            
                                How to pick random (small) data samples using Map/Reduce?
                            
                                Can I write a plain text HDFS (or local) file from a Spark program, not from an RDD?
                            
                                Problems with Hadoop distcp from HDFS to Amazon S3

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Hive : Insert overwrite multiple partitions

Tags:

hadoop

hive

rahul

People also ask

1 Answers

libjack

Recent Activity

Donate For Us