Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Partition in metastore but path doesn't exist in HDFS

Tags:

hive

hdfs

We had an issue with our ingestion process that would result in partitions being added to a table in Hive, but the path in HDFS didn't actually exist. We've fixed that issue, but we still have these bad partitions. When querying these tables using Tez, we get FileNotFound exception, pointing to the location in HDFS that doesn't exist. If we use MR instead of Tez, the query works (which is very confusing to me), but it's too slow.

Is there a way to list all the partitions that have this probem? MSCK REPAIR seems to handle the opposite problem, where the data exists in HDFS but there is no partition in Hive.

EDIT: More info. Here's the output of the file not found exception:

java.io.FileNotFoundException: File hdfs://<server>/db/tables/2016/03/14/mytable does not exist.

If I run show partitions <db.mytable>, I'll get all the partitions, including one for dt=2016-03-14.

show table extended like '<db.mytable>' partition(dt='2016-03-14' returns the same location: location:hdfs://server/db/tables/2016/03/14/mytable.

like image 704
Andrew Avatar asked Jul 05 '17 16:07

Andrew


2 Answers

MSCK REPAIR TABLE <tablename> does not provide this facility and I also face this same issue and I found solution for this,

As we know 'msck repair' command add partitions based on directory, So first drop all partitions

hive>ALTER TABLE mytable drop if exists partitions(p<>'');

above command remove all partitions ,

then use msck repair command then it will create partition from directory present at table location.

hive>msck repair table mytable
like image 199
Aditya Tambe Avatar answered Oct 22 '22 22:10

Aditya Tambe


It seem MSCK REPAIR TABLE does not drop partitions that point to missing directories, but it does list these partitions (see Partitions not in metastore:), so with a little scripting / manual work, you can drop them based on the given list.

hive> create table mytable (i int) partitioned by (p int);
OK
Time taken: 0.539 seconds

hive> !mkdir mytable/p=1;
hive> !mkdir mytable/p=2;
hive> !mkdir mytable/p=3;

hive> msck repair table mytable;
OK
Partitions not in metastore:    mytable:p=1 mytable:p=2 mytable:p=3
Repair: Added partition to metastore mytable:p=1
Repair: Added partition to metastore mytable:p=2
Repair: Added partition to metastore mytable:p=3
Time taken: 0.918 seconds, Fetched: 4 row(s)

hive> show partitions mytable;
OK
p=1
p=2
p=3
Time taken: 0.331 seconds, Fetched: 3 row(s)

hive> !rmdir mytable/p=1;
hive> !rmdir mytable/p=2;
hive> !rmdir mytable/p=3;

hive> msck repair table mytable;
OK
Partitions missing from filesystem: mytable:p=1 mytable:p=2 mytable:p=3
Time taken: 0.425 seconds, Fetched: 1 row(s)

hive> show partitions mytable;
OK
p=1
p=2
p=3
Time taken: 0.56 seconds, Fetched: 3 row(s)
like image 2
David דודו Markovitz Avatar answered Oct 22 '22 21:10

David דודו Markovitz