Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to update partition metadata in Hive , when partition data is manualy deleted from HDFS

What is the way to automatically update the metadata of Hive partitioned tables?

If new partition data's were added to HDFS (without alter table add partition command execution) . then we can sync up the metadata by executing the command 'msck repair'.

What to be done if a lot of partitioned data were deleted from HDFS (without the execution of alter table drop partition commad execution).

What is the way to syncup the Hive metatdata?

like image 243
vinu.m.19 Avatar asked Jan 14 '14 07:01

vinu.m.19


People also ask

How do I refresh metadata in Hive?

To flush the metadata for all tables, use the INVALIDATE METADATA command. Because REFRESH table_name only works for tables that the current Impala node is already aware of, when you create a new table in the Hive shell, enter INVALIDATE METADATA new_table before you can see the new table in impala-shell.

How do I sync metadata in Hive?

What is the way to automatically update the metadata of Hive partitioned tables? If new partition data's were added to HDFS (without alter table add partition command execution) . then we can sync up the metadata by executing the command 'msck repair'.


1 Answers

EDIT : Starting with Hive 3.0.0 MSCK can now discover new partitions or remove missing partitions (or both) using the following syntax :

MSCK [REPAIR] TABLE table_name [ADD/DROP/SYNC PARTITIONS] 

This was implemented in HIVE-17824


As correctly stated by HakkiBuyukcengiz, MSCK REPAIR doesn't remove partitions if the corresponding folder on HDFS was manually deleted, it only adds partitions if new folders are created.

Extract from offical documentation :

In other words, it will add any partitions that exist on HDFS but not in metastore to the metastore.

This is what I usually do in the presence of external tables if multiple partitions folders are manually deleted on HDFS and I want to quickly refresh the partitions :

  • Drop the table (DROP TABLE table_name) (dropping an external table does not delete the underlying partition files)
  • Recreate the table (CREATE EXTERNAL TABLE table_name ...)
  • Repair it (MSCK REPAIR TABLE table_name)

Depending on the number of partitions this can take a long time. The other solution is to use ALTER TABLE DROP PARTITION (...) for each deleted partition folder but this can be tedious if multiple partitions were deleted.

like image 104
cheseaux Avatar answered Oct 25 '22 08:10

cheseaux