I have a Hive table, which is partitioned by column dt
. I need to add a partition if it does not exists, for exmaple, dt='20181219'
.
Now I'm using HiveMetaStoreClient#getPartition(dbName, tableName, 20181219)
. If the partition does not exists, then catch NoSuchObjectException
and add it.
Is there any elegant way to achieve this in Java?
You can run the HDFS list command to show all partition folders of a table from the Hive data warehouse location. This option is only helpful if you have all your partitions of the table are at the same location.
You can execute " msck repair table <table_name> " command to find out missing partition in Hive Metastore and it will also add partitions if underlying HDFS directories are present.
Hive organizes tables into partitions. It is a way of dividing a table into related parts based on the values of partitioned columns such as date, city, and department. Using partition, it is easy to query a portion of the data.
Use add_partition(Partition, ifNotExists, needResults)
(javadoc) ... which (if the 2nd argument is true
) will only create an partition if it doesn't already exist.
Alternatively, just use add_partition(Partition)
to add the partition without a test, and catch the AlreadyExistsException
if it occurs.
Any scheme that involves testing that an action is possible and then doing that action has a potential race condition. In between the "test" and the "do", some other agent (thread, client, whatever) could do an action (the same one or a different one) that will cause your attempt to fail.
So your current approach is not only ugly. It also has a potential race condition if partitions could be created by multiple agents.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With