Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does MSCK REPAIR TABLE do behind the scenes and why it's so slow?

I know that MSCK REPAIR TABLE updates the metastore with the current partitions of an external table.

To do that, you only need to do ls on the root folder of the table (given the table is partitioned by only one column), and get all its partitions, clearly a < 1s operation.

But in practice, the operation can take a very long time to execute (or even timeout if ran on AWS Athena).

So my question is, what does MSCK REPAIR TABLE actually do behind the scenes and why?

How does MSCK REPAIR TABLE find the partitions?


Additional data in case it's relevant:

Our data is all on S3, it's both slow when running on EMR (Hive) or Athena (Presto), there are ~450 partitions in the table, every partition has on avg 90 files, overall 3 Gigabytes for a partition, files are in Apache parquet format

like image 491
gdoron is supporting Monica Avatar asked Dec 07 '18 10:12

gdoron is supporting Monica


People also ask

Why does MSCK repair table take so long?

MSCK is slow because it is waiting for S3 listing among other things. Athena would read the entire 9.7MB table if it were in one file faster than it would be able to list that huge directory structure. I recommend removing the partitions completely, or if you really must have them then remove p2, p3 and p4 levels.

What does MSCK repair table do?

The MSCK REPAIR TABLE command scans a file system such as Amazon S3 for Hive compatible partitions that were added to the file system after the table was created. MSCK REPAIR TABLE compares the partitions in the table metadata and the partitions in S3.

What does alter table recover partitions do?

ALTER TABLE RECOVER PARTITIONS is the command that is widely used in Hive to refresh partitions as new partitions are directly added to the file system by other users. Qubole has added path validation to the ALTER TABLE RECOVER PARTITIONS command. The command only recovers valid paths.

What does load partitions do in Athena?

By partitioning your data, you can restrict the amount of data scanned by each query, thus improving performance and reducing cost.


Video Answer


1 Answers

You are right in the sense it reads the directory structure, creates partitions out of it and then updates the hive metastore. In fact more recently, the command was improved to remove non-existing partitions from metastore as well. The example that you are giving is very simple since it has only one level of partition keys. Consider table with multiple partition keys (2-3 partition keys is common in practice). msck repair will have to do a full-tree traversal of all the sub-directories under the table directory, parse the file names, make sure that the file names are valid, check if the partition is already existing in the metastore and then add the only partitions which are not present in the metastore. Note that each listing on the filesystem is a RPC to the namenode (in case of HDFS) or a web-service call in case of S3 or ADLS which can add to significant amount of time. Additionally, in order to figure out if the partition is already present in metastore or not, it needs to do a full listing of all the partitions which metastore knows of for the table. Both these steps can potentially increase the time taken for the command on large tables. The performance of msck repair table was improved considerably recently Hive 2.3.0 (see HIVE-15879 for more details). You may want to tune hive.metastore.fshandler.threads and hive.metastore.batch.retrieve.max to improve the performance of command.

like image 120
Vihang Avatar answered Sep 18 '22 08:09

Vihang