Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data block size in HDFS, why 64MB?

The default data block size of HDFS/Hadoop is 64MB. The block size in the disk is generally 4KB.

What does 64MB block size mean? ->Does it mean that the smallest unit of reading from disk is 64MB?

If yes, what is the advantage of doing that?-> easy for continuous access of large files in HDFS?

Can we do the same by using the disk's original 4KB block size?

like image 990
dykw Avatar asked Oct 20 '13 03:10

dykw


People also ask

Why is a block size in HDFS so large?

Why is a Block in HDFS So Large? HDFS blocks are huge than the disk blocks, and the explanation is to limit the expense of searching. The time or cost to transfer the data from the disk can be made larger than the time to seek for the beginning of the block by simply improving the size of blocks significantly.

What is the usual size of the data block on HDFS?

A typical block size used by HDFS is 128 MB.

What is a block in HDFS why block size 128mb?

Block size means smallest unit of data in file system. In HDFS, block size can be configurable as per requirements, but default is 128 MB. Traditional file systems like of Linux have default block size of 4 KB.

What is the default size of HDFS data block 16mb 32mb 64MB 128mb?

The default size of the HDFS data block is 128 MB. If blocks are small, there will be too many blocks in Hadoop HDFS and thus too much metadata to store.


1 Answers

What does 64MB block size mean?

The block size is the smallest data unit that a file system can store. If you store a file that's 1k or 60Mb, it'll take up one block. Once you cross the 64Mb boundary, you need a second block.

If yes, what is the advantage of doing that?

HDFS is meant to handle large files. Let's say you have a 1000Mb file. With a 4k block size, you'd have to make 256,000 requests to get that file (1 request per block). In HDFS, those requests go across a network and come with a lot of overhead. Each request has to be processed by the Name Node to determine where that block can be found. That's a lot of traffic! If you use 64Mb blocks, the number of requests goes down to 16, significantly reducing the cost of overhead and load on the Name Node.

like image 149
bstempi Avatar answered Oct 02 '22 09:10

bstempi