What's the easiest way to find file associated with a block in HDFS given a block Name/ID
Not sure when this was introduced but you can do this
hdfs fsck -blockId <block_id>
hdfs fsck -blockId blk_1100790203
Connecting to namenode
FSCK started by hdfs
Block Id: blk_1100790203
Block belongs to: /tmp/1447685899336.txt
Option 1: the suffix .meta
is needed if using the blockId with generationStamp
$ hdfs fsck -blockId blk_1073823706_82968.meta
Option 2: use the blockId without generationStamp
$ hdfs fsck -blockId blk_1073823706
The long and painful way, assuming you have read access to all the files (and execute for the directories):
hadoop fsck / -files -blocks | grep blk_520275863902385418_1002 -B 20
Then scan back up from your block match to the previous file name:
/hadoop/mapred/system/jobtracker.info 4 bytes, 1 block(s): OK
0. blk_520275863902385418_1002 len=4 repl=1
In this case blk_5202... is part of the /hadoop/mapred/system/jobtracker.info
file
Programmatically, these isn't an interface to the name node that allows you to search by block ID, but you could look into the source for the secondary name node and see how it consolidates the edits - then experiment on the saved output from the secondary name node (rather than risking working on the live name node file).
Good luck!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With