Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MSCK REPAIR hive external tables

Tags:

hive

I have a daily ingestion of data in to HDFS . From data into HDFS I generate Hive external tables partitioned by date . My qestion is as follows , should I run MSCK REPAIR TABLE tablename after each data ingestion , in this case I have to run the command each day. Or running it just one time at the table creation is enough . Thanks a lot for your answers

Best regards

like image 915
scalacode Avatar asked Jun 13 '18 08:06

scalacode


People also ask

What does MSCK repair table do in Hive?

The MSCK REPAIR TABLE command scans a file system such as Amazon S3 for Hive compatible partitions that were added to the file system after the table was created. MSCK REPAIR TABLE compares the partitions in the table metadata and the partitions in S3.

What does MSCK stand for in Hive?

Similar to how fsck stands for filesystem consistency check, msck is Hive's metastore consistency check. Follow this answer to receive notifications.

What is MSCK in Metastore?

MSCK REPAIR TABLE recovers all the partitions in the directory of a table and updates the Hive metastore. When creating a table using PARTITIONED BY clause, partitions are generated and registered in the Hive metastore.


1 Answers

You only run MSCK REPAIR TABLE while the structure or partition of the external table is changed. This command updates the metadata of the table.

One example that usually happen, e.g.

You use a field dt which represent a date to partition the table.

  • Yesterday, you inserted some data which is dt=2018-06-12, then you should run MSCK REPAIR TABLE to update the metadata to tell hive to aware a new partition dt=2018-06-12.
  • Today, you insert some data which is dt=2018-06-13, then you should run MSCK REPAIR TABLE to update the metadata to tell hive to aware a new partition dt=2018-06-13.
like image 79
kkpoon Avatar answered Nov 15 '22 08:11

kkpoon