We had an issue with our ingestion process that would result in partitions being added to a table in Hive, but the path in HDFS didn't actually exist. We've fixed that issue, but we still have these bad partitions. When querying these tables using Tez, we get FileNotFound exception, pointing to the location in HDFS that doesn't exist. If we use MR instead of Tez, the query works (which is very confusing to me), but it's too slow.
Is there a way to list all the partitions that have this probem? MSCK REPAIR
seems to handle the opposite problem, where the data exists in HDFS but there is no partition in Hive.
EDIT: More info. Here's the output of the file not found exception:
java.io.FileNotFoundException: File hdfs://<server>/db/tables/2016/03/14/mytable does not exist.
If I run show partitions <db.mytable>
, I'll get all the partitions, including one for dt=2016-03-14
.
show table extended like '<db.mytable>' partition(dt='2016-03-14'
returns the same location:
location:hdfs://server/db/tables/2016/03/14/mytable
.
MSCK REPAIR TABLE <tablename>
does not provide this facility and I also face this same issue and I found solution for this,
As we know 'msck repair' command add partitions based on directory, So first drop all partitions
hive>ALTER TABLE mytable drop if exists partitions(p<>'');
above command remove all partitions ,
then use msck repair
command then it will create partition from directory present at table location.
hive>msck repair table mytable
It seem MSCK REPAIR TABLE
does not drop partitions that point to missing directories, but it does list these partitions (see Partitions not in metastore:
), so with a little scripting / manual work, you can drop them based on the given list.
hive> create table mytable (i int) partitioned by (p int);
OK
Time taken: 0.539 seconds
hive> !mkdir mytable/p=1;
hive> !mkdir mytable/p=2;
hive> !mkdir mytable/p=3;
hive> msck repair table mytable;
OK
Partitions not in metastore: mytable:p=1 mytable:p=2 mytable:p=3
Repair: Added partition to metastore mytable:p=1
Repair: Added partition to metastore mytable:p=2
Repair: Added partition to metastore mytable:p=3
Time taken: 0.918 seconds, Fetched: 4 row(s)
hive> show partitions mytable;
OK
p=1
p=2
p=3
Time taken: 0.331 seconds, Fetched: 3 row(s)
hive> !rmdir mytable/p=1;
hive> !rmdir mytable/p=2;
hive> !rmdir mytable/p=3;
hive> msck repair table mytable;
OK
Partitions missing from filesystem: mytable:p=1 mytable:p=2 mytable:p=3
Time taken: 0.425 seconds, Fetched: 1 row(s)
hive> show partitions mytable;
OK
p=1
p=2
p=3
Time taken: 0.56 seconds, Fetched: 3 row(s)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With