Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to find file from blockName in HDFS hadoop

Tags:

hadoop

hdfs

What's the easiest way to find file associated with a block in HDFS given a block Name/ID

like image 289
Inder Singh Avatar asked Jun 04 '12 12:06

Inder Singh


3 Answers

Not sure when this was introduced but you can do this

hdfs fsck -blockId <block_id>

hdfs fsck -blockId blk_1100790203
Connecting to namenode 
FSCK started by hdfs 

Block Id: blk_1100790203
Block belongs to: /tmp/1447685899336.txt
like image 34
Abhijith Avatar answered Oct 20 '22 00:10

Abhijith


Option 1: the suffix .meta is needed if using the blockId with generationStamp

$ hdfs fsck -blockId blk_1073823706_82968.meta

Option 2: use the blockId without generationStamp

$ hdfs fsck -blockId blk_1073823706
like image 151
secfree Avatar answered Oct 20 '22 00:10

secfree


The long and painful way, assuming you have read access to all the files (and execute for the directories):

hadoop fsck / -files -blocks | grep blk_520275863902385418_1002 -B 20

Then scan back up from your block match to the previous file name:

/hadoop/mapred/system/jobtracker.info 4 bytes, 1 block(s):  OK
0. blk_520275863902385418_1002 len=4 repl=1

In this case blk_5202... is part of the /hadoop/mapred/system/jobtracker.info file

Programmatically, these isn't an interface to the name node that allows you to search by block ID, but you could look into the source for the secondary name node and see how it consolidates the edits - then experiment on the saved output from the secondary name node (rather than risking working on the live name node file).

Good luck!

like image 23
Chris White Avatar answered Oct 20 '22 01:10

Chris White