Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are HDFS files getting stored on underlying OS filesystem?

Tags:

hadoop

hdfs

HDFS is a logical filesystem in Hadoop with a Block size of 64MB. A file on HDFS is saved on the underlying OS filesystem, say ext4 with 4KiB as the block size.

To my knowledge, for a file on the local file system, OS uses start and end cylinders of the physical hard disk of the 4KiB block for its retrieval. HDFS files are also saved on the ext4 underlying filesystem. The HDFS files are also to be retrieved with the help of 4KiB blocks start and end cylinders only.

If that is the case, this won't increase the speed of data retrieval. Now the question is, what is the technique used in HDFS wrt hard disk for increasing its retrieval speed?

like image 750
Shashikanth Komandoor Avatar asked Oct 31 '22 17:10

Shashikanth Komandoor


1 Answers

The retrieval speed from the ext filesystem isn't changed as you are thinking it very correctly. But what happens is a large file is split into pieces of 64Mb, say, which are stored on different machines. So when the retrieval call is made, multiple machines read the file pieces simultaneously and report to the main machine (Name node). This way, things speed up. It is the same as ten men finishing a building task in 1 day rather than one man in 10 days.

like image 192
Abhishek Avatar answered Nov 09 '22 01:11

Abhishek